Class: Clacky::Vision::Resolver
- Inherits:
-
Object
- Object
- Clacky::Vision::Resolver
- Defined in:
- lib/clacky/vision/resolver.rb
Overview
OCR sidecar — turns image bytes into a text description by calling a vision-capable model. Used when the user’s primary model is text-only (e.g. DeepSeek V4) so that uploaded images and tool screenshots still reach the conversation as useful context.
Routes through Clacky::Client so we get the same OpenAI/Anthropic/ Bedrock format negotiation, retry, and credit-error handling as the main agent path. Image content travels as a canonical ‘image_url` block (the unified internal shape understood by all three formats).
Defined Under Namespace
Classes: Result
Constant Summary collapse
- DEFAULT_PROMPT =
<<~PROMPT.strip Extract every legible text and describe the visual content of this image. Output as Markdown. Preserve table layout where possible (use Markdown tables). For UI screenshots, describe the layout, visible labels, and active state. Be thorough but concise — the user cannot see the image and must rely on your description. PROMPT
- MAX_TOKENS =
8192- CACHE_DIR =
File.join(Dir.home, ".clacky", "ocr_cache")
- CACHE_VERSION =
1
Instance Method Summary collapse
-
#describe(image, prompt: nil) ⇒ Result
One of: status=:ok + text — sidecar produced a description status=:empty — sidecar returned 200 but no usable text (e.g. token budget exhausted by reasoning) status=:call_failed + error — network/parse/auth error from the sidecar status=:bad_image — image bytes unreadable / empty.
-
#initialize(model_entry) ⇒ Resolver
constructor
A new instance of Resolver.
Constructor Details
#initialize(model_entry) ⇒ Resolver
Returns a new instance of Resolver.
40 41 42 43 44 45 46 |
# File 'lib/clacky/vision/resolver.rb', line 40 def initialize(model_entry) @model_entry = model_entry @model = model_entry["model"] @base_url = model_entry["base_url"] @api_key = model_entry["api_key"] @anthropic = !!model_entry["anthropic_format"] end |
Instance Method Details
#describe(image, prompt: nil) ⇒ Result
Returns one of: status=:ok + text — sidecar produced a description status=:empty — sidecar returned 200 but no usable text (e.g. token budget exhausted by reasoning) status=:call_failed + error — network/parse/auth error from the sidecar status=:bad_image — image bytes unreadable / empty.
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/clacky/vision/resolver.rb', line 53 def describe(image, prompt: nil) prompt = prompt.to_s.strip prompt = DEFAULT_PROMPT if prompt.empty? bytes, mime = read_image(image) return Result.new(status: :bad_image) if bytes.nil? || bytes.empty? cached = cache_get(bytes, prompt) return Result.new(status: :ok, text: cached) if cached text = call_vlm(bytes, mime, prompt) return Result.new(status: :empty) if text.nil? || text.strip.empty? cache_put(bytes, prompt, text) Result.new(status: :ok, text: text) rescue => e Clacky::Logger.warn("[Vision::Resolver] failed: #{e.class}: #{e.}") if defined?(Clacky::Logger) Result.new(status: :call_failed, error: "#{e.class}: #{e.}") end |