Class: Rubino::Tools::VisionTool
- Inherits:
-
Base
- Object
- Base
- Rubino::Tools::VisionTool
show all
- Defined in:
- lib/rubino/tools/vision_tool.rb
Overview
Delegates image-understanding to a multimodal aux model so a text-only primary can still “see” what the user uploaded. Implements the agent-as-tool semantics from the OpenAI Agents SDK: the primary stays in control, calls this tool with a focused question, and receives a structured (text) reply — no conversation handoff, no shared history.
The aux model is resolved from ‘auxiliary.vision` in config. Registry hides this tool ONLY when no aux vision model is configured AND the primary itself can’t see (per Configuration#model_supports_vision?) —the one case where calling it could only error. Whenever the primary supports vision OR an aux model is set, the tool stays EXPOSED (see Tools::Registry#aux_dependency_satisfied?), since the model may still prefer to delegate to a better-suited aux model.
Instance Attribute Summary
Attributes inherited from Base
#cancel_token, #read_tracker, #stream_chunk, #stream_kind
Instance Method Summary
collapse
Methods inherited from Base
#cancellation_requested?, #config_key, #display_name, #emit_chunk, #mcp?, #risky?, #to_tool_definition, workspace_root, workspace_roots
Instance Method Details
#call(arguments) ⇒ Object
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
|
# File 'lib/rubino/tools/vision_tool.rb', line 55
def call(arguments)
path = (arguments["file_path"] || arguments[:file_path]).to_s
question = (arguments["question"] || arguments[:question] ||
"Describe what you see in markdown.").to_s
return "Error: file_path is required" if path.empty?
expanded = File.expand_path(path)
return outside_workspace_message(path) if outside_workspace?(expanded)
return "Error: file not found: #{path}" unless File.exist?(expanded)
return "Error: not a regular file: #{path}" unless File.file?(expanded)
ext = File.extname(expanded).downcase
unless LLM::ContentBuilder::SUPPORTED_IMAGE_TYPES.include?(ext)
return "Error: unsupported image extension '#{ext}'. " \
"Supported: #{LLM::ContentBuilder::SUPPORTED_IMAGE_TYPES.join(", ")}"
end
unless Attachments::Policy.aux_vision_egress?
return "Error: image egress is disabled by config " \
"(attachments.policy.aux_vision_egress: false). " \
"The vision tool will not send image bytes to the auxiliary model."
end
classification = Attachments::Classify.call(expanded)
unless classification&.safe && classification.kind == :image
return "Error: '#{path}' is not a valid image (extension spoof or corrupt file?). " \
"Its content is not a recognised image format, so nothing was sent to the vision model."
end
response = LLM::AuxiliaryClient.new.call(
task: :vision,
messages: [{ role: "user", content: question }],
image_paths: [expanded]
)
response.content.to_s
rescue StandardError => e
"Error calling vision model: #{e.class}: #{e.message}"
end
|
#description ⇒ Object
27
28
29
30
31
32
|
# File 'lib/rubino/tools/vision_tool.rb', line 27
def description
"Ask a multimodal model to describe or interpret an image. " \
"Use when you need to understand visual content (charts, screenshots, " \
"diagrams, photos). Provide an optional focused question to direct the " \
"analysis; default is a full markdown description."
end
|
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
# File 'lib/rubino/tools/vision_tool.rb', line 34
def input_schema
{
type: "object",
properties: {
file_path: {
type: "string",
description: "Absolute path to an image file (.png .jpg .jpeg .webp .gif .bmp)"
},
question: {
type: "string",
description: "Optional focused question. Default: 'Describe what you see in markdown.'"
}
},
required: %w[file_path]
}
end
|
#name ⇒ Object
23
24
25
|
# File 'lib/rubino/tools/vision_tool.rb', line 23
def name
"vision"
end
|
#risk_level ⇒ Object
51
52
53
|
# File 'lib/rubino/tools/vision_tool.rb', line 51
def risk_level
:low
end
|