Class: AgentSandbox::BrowserTools::Screenshot
- Defined in:
- lib/agent_sandbox/browser_tools.rb
Overview
Screenshots the current viewport and “reads” it by running a sub-request against a multimodal model. Returns the description as text so the main tool-loop isn’t constrained by OpenAI’s rule that only role:user messages may contain images.
Instance Method Summary collapse
- #execute(focus: nil) ⇒ Object
-
#initialize(sandbox, vision_model:) ⇒ Screenshot
constructor
A new instance of Screenshot.
Methods inherited from Base
Constructor Details
#initialize(sandbox, vision_model:) ⇒ Screenshot
Returns a new instance of Screenshot.
224 225 226 227 |
# File 'lib/agent_sandbox/browser_tools.rb', line 224 def initialize(sandbox, vision_model:) @vision_model = vision_model super(sandbox) end |
Instance Method Details
#execute(focus: nil) ⇒ Object
229 230 231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/agent_sandbox/browser_tools.rb', line 229 def execute(focus: nil) sandbox_path = "/tmp/agent-shot-#{Time.now.to_f.to_s.tr('.', '')}.png" data = run_ab(["screenshot", sandbox_path, "--json"]) return data if data.is_a?(Hash) && data[:error] bytes = @sandbox.read_file(sandbox_path) @sandbox.exec("rm -f #{Shellwords.escape(sandbox_path)}") description = VisionSupport.read_image_bytes( bytes, extension: "png", focus: focus, vision_model: @vision_model ) { description: description, bytes: bytes.bytesize, vision_model: @vision_model } end |