Class: Clacky::Vision::Resolver

Inherits:
Object
  • Object
show all
Defined in:
lib/clacky/vision/resolver.rb

Overview

OCR sidecar — turns image bytes into a text description by calling a vision-capable model. Used when the user’s primary model is text-only (e.g. DeepSeek V4) so that uploaded images and tool screenshots still reach the conversation as useful context.

Routes through Clacky::Client so we get the same OpenAI/Anthropic/ Bedrock format negotiation, retry, and credit-error handling as the main agent path. Image content travels as a canonical ‘image_url` block (the unified internal shape understood by all three formats).

Defined Under Namespace

Classes: Result

Constant Summary collapse

DEFAULT_PROMPT =
<<~PROMPT.strip
  Extract every legible text and describe the visual content of this image.
  Output as Markdown. Preserve table layout where possible (use Markdown tables).
  For UI screenshots, describe the layout, visible labels, and active state.
  Be thorough but concise — the user cannot see the image and must rely on
  your description.
PROMPT
MAX_TOKENS =
8192
CACHE_DIR =
File.join(Dir.home, ".clacky", "ocr_cache")
CACHE_VERSION =
1

Instance Method Summary collapse

Constructor Details

#initialize(model_entry) ⇒ Resolver

Returns a new instance of Resolver.



40
41
42
43
44
45
46
# File 'lib/clacky/vision/resolver.rb', line 40

def initialize(model_entry)
  @model_entry = model_entry
  @model       = model_entry["model"]
  @base_url    = model_entry["base_url"]
  @api_key     = model_entry["api_key"]
  @anthropic   = !!model_entry["anthropic_format"]
end

Instance Method Details

#describe(image, prompt: nil) ⇒ Result

Returns one of: status=:ok + text — sidecar produced a description status=:empty — sidecar returned 200 but no usable text (e.g. token budget exhausted by reasoning) status=:call_failed + error — network/parse/auth error from the sidecar status=:bad_image — image bytes unreadable / empty.

Returns:

  • (Result)

    one of: status=:ok + text — sidecar produced a description status=:empty — sidecar returned 200 but no usable text (e.g. token budget exhausted by reasoning) status=:call_failed + error — network/parse/auth error from the sidecar status=:bad_image — image bytes unreadable / empty



53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/clacky/vision/resolver.rb', line 53

def describe(image, prompt: nil)
  prompt = prompt.to_s.strip
  prompt = DEFAULT_PROMPT if prompt.empty?

  bytes, mime = read_image(image)
  return Result.new(status: :bad_image) if bytes.nil? || bytes.empty?

  cached = cache_get(bytes, prompt)
  return Result.new(status: :ok, text: cached) if cached

  text = call_vlm(bytes, mime, prompt)
  return Result.new(status: :empty) if text.nil? || text.strip.empty?

  cache_put(bytes, prompt, text)
  Result.new(status: :ok, text: text)
rescue => e
  Clacky::Logger.warn("[Vision::Resolver] failed: #{e.class}: #{e.message}") if defined?(Clacky::Logger)
  Result.new(status: :call_failed, error: "#{e.class}: #{e.message}")
end