Module: AgentSandbox::BrowserTools::VisionSupport

Defined in:: lib/agent_sandbox/browser_tools.rb

Overview

Mixin: download bytes out of the sandbox into a host tempfile, run a multimodal sub-call on the image, and clean up the tempfile right after. Keeps no global state — each call is self-contained.

Constant Summary collapse

DEFAULT_FOCUS_PROMPT =

lambda { |focus|
  "Read this image. Focus on: #{focus}. Return structured plain " \
    "text. Quote exact numbers and labels as they appear. If " \
    "something isn't visible, say so instead of guessing."
}

DEFAULT_GENERAL_PROMPT =

"Describe this image. List every product, price, heading, and " \
"notable text you see. Be exact with numbers and labels."

Class Method Summary collapse

.read_image_bytes(bytes, extension:, focus:, vision_model:) ⇒ Object

Class Method Details

.read_image_bytes(bytes, extension:, focus:, vision_model:) ⇒ `Object`