Class: AgentSandbox::BrowserTools::Screenshot

Inherits:

Base

Object
RubyLLM::Tool
Base
AgentSandbox::BrowserTools::Screenshot

show all

Defined in:: lib/agent_sandbox/browser_tools.rb

Overview

Screenshots the current viewport and “reads” it by running a sub-request against a multimodal model. Returns the description as text so the main tool-loop isn’t constrained by OpenAI’s rule that only role:user messages may contain images.

Instance Method Summary collapse

#execute(focus: nil) ⇒ Object
#initialize(sandbox, vision_model:) ⇒ Screenshot constructor

A new instance of Screenshot.

Methods inherited from Base

#run_ab

Constructor Details

#initialize(sandbox, vision_model:) ⇒ `Screenshot`

Returns a new instance of Screenshot.

# File 'lib/agent_sandbox/browser_tools.rb', line 224

def initialize(sandbox, vision_model:)
  @vision_model = vision_model
  super(sandbox)
end

Instance Method Details

#execute(focus: nil) ⇒ `Object`

# File 'lib/agent_sandbox/browser_tools.rb', line 229

def execute(focus: nil)
  sandbox_path = "/tmp/agent-shot-#{Time.now.to_f.to_s.tr('.', '')}.png"
  data = run_ab(["screenshot", sandbox_path, "--json"])
  return data if data.is_a?(Hash) && data[:error]

  bytes = @sandbox.read_file(sandbox_path)
  @sandbox.exec("rm -f #{Shellwords.escape(sandbox_path)}")

  description = VisionSupport.read_image_bytes(
    bytes, extension: "png", focus: focus, vision_model: @vision_model
  )
  { description: description, bytes: bytes.bytesize, vision_model: @vision_model }
end