Class: Rubino::Tools::VisionTool
- Inherits:
-
Base
- Object
- Base
- Rubino::Tools::VisionTool
show all
- Defined in:
- lib/rubino/tools/vision_tool.rb
Overview
Delegates image-understanding to a multimodal aux model so a text-only primary can still “see” what the user uploaded. Implements the agent-as-tool semantics from the OpenAI Agents SDK: the primary stays in control, calls this tool with a focused question, and receives a structured (text) reply — no conversation handoff, no shared history.
The aux model is resolved from ‘auxiliary.vision` in config. When the primary already supports vision (per Configuration#model_supports_vision?) AND no aux is configured, Registry hides this tool — there’s no useful delegation to perform.
Instance Attribute Summary
Attributes inherited from Base
#cancel_token, #read_tracker, #stream_chunk
Instance Method Summary
collapse
Methods inherited from Base
#cancellation_requested?, #config_key, #emit_chunk, #risky?, #to_tool_definition, workspace_root, workspace_roots
Instance Method Details
#call(arguments) ⇒ Object
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
|
# File 'lib/rubino/tools/vision_tool.rb', line 50
def call(arguments)
path = (arguments["file_path"] || arguments[:file_path]).to_s
question = (arguments["question"] || arguments[:question] ||
"Describe what you see in markdown.").to_s
return "Error: file_path is required" if path.empty?
expanded = File.expand_path(path)
return "Error: file not found: #{path}" unless File.exist?(expanded)
return "Error: not a regular file: #{path}" unless File.file?(expanded)
ext = File.extname(expanded).downcase
unless LLM::ContentBuilder::SUPPORTED_IMAGE_TYPES.include?(ext)
return "Error: unsupported image extension '#{ext}'. " \
"Supported: #{LLM::ContentBuilder::SUPPORTED_IMAGE_TYPES.join(", ")}"
end
response = LLM::AuxiliaryClient.new.call(
task: :vision,
messages: [{ role: "user", content: question }],
image_paths: [expanded]
)
response.content.to_s
rescue StandardError => e
"Error calling vision model: #{e.class}: #{e.message}"
end
|
#description ⇒ Object
22
23
24
25
26
27
|
# File 'lib/rubino/tools/vision_tool.rb', line 22
def description
"Ask a multimodal model to describe or interpret an image. " \
"Use when you need to understand visual content (charts, screenshots, " \
"diagrams, photos). Provide an optional focused question to direct the " \
"analysis; default is a full markdown description."
end
|
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
# File 'lib/rubino/tools/vision_tool.rb', line 29
def input_schema
{
type: "object",
properties: {
file_path: {
type: "string",
description: "Absolute path to an image file (.png .jpg .jpeg .webp .gif .bmp)"
},
question: {
type: "string",
description: "Optional focused question. Default: 'Describe what you see in markdown.'"
}
},
required: %w[file_path]
}
end
|
#name ⇒ Object
18
19
20
|
# File 'lib/rubino/tools/vision_tool.rb', line 18
def name
"vision"
end
|
#risk_level ⇒ Object
46
47
48
|
# File 'lib/rubino/tools/vision_tool.rb', line 46
def risk_level
:low
end
|