Class: Clacky::Tools::Browser

Inherits:
Base
  • Object
show all
Defined in:
lib/clacky/tools/browser.rb

Overview

Browser tool — controls the user’s real Chromium-based browser (Chrome 146+) via the Chrome DevTools MCP server (chrome-devtools-mcp).

Architecture: uses the existing-session driver (Chrome MCP).

chrome-devtools-mcp --autoConnect --experimentalStructuredContent
    --experimental-page-id-routing

Communication: MCP stdio JSON-RPC 2.0 over a persistent (daemon) process. The MCP server process is started once, kept alive across all tool calls, and only restarted when the process dies unexpectedly.

pageId is intentionally NOT passed to most MCP calls — the MCP server maintains its own selected page state. Only focus/close actions need pageId. When the selected page has been closed, mcp_call automatically retries once.

Constant Summary collapse

MIN_CHROME_MAJOR =
146
MCP_HANDSHAKE_TIMEOUT =
10
MCP_CALL_TIMEOUT =
60
MIN_NODE_MAJOR =
20
MAX_SNAPSHOT_CHARS =
4000
MAX_LLM_OUTPUT_CHARS =
6000
BROWSER_CONFIG_PATH =
File.expand_path("~/.clacky/browser.yml").freeze
BROWSER_DIAGNOSIS_HINT =
<<~HINT.strip.freeze
  Inform the user and ask if they'd like to run a diagnosis.
  If yes, invoke the browser-setup skill with subcommand "doctor".
HINT
BROWSER_NOT_CONNECTED_HINT =

Cause 1+2: Chrome not running, or Remote Debugging disabled (MCP can’t distinguish them)

<<~HINT.strip.freeze
  Chrome is not reachable. Possible causes:
  1. Chrome is not running — ask the user to open Chrome.
  2. Remote Debugging is disabled — enable via chrome://inspect/#remote-debugging.
HINT
BROWSER_DAEMON_HINT =

Cause 3: MCP daemon crashed or failed to start

<<~HINT.strip.freeze
  The browser MCP daemon crashed or failed to start. It may recover automatically on the next action.
  If it keeps failing, ask the user to restart Clacky.
HINT
BROWSER_RESTART_HINT =

Cause 4: Chrome long-session unresponsiveness

<<~HINT.strip.freeze
  Chrome has become unresponsive. This often happens after Chrome has been running for a long time.
  Ask the user to restart Chrome, then retry the action.
HINT
SCREENSHOT_MAX_WIDTH =
800
SCREENSHOT_MAX_BASE64_BYTES =
150_000
INTERACTIVE_ROLES =

Snapshot rendering


%w[
  button link textbox checkbox radio select combobox
  menuitem option tab switch searchbox spinbutton
  slider menuitemcheckbox menuitemradio
].freeze
STRUCTURAL_ROLES =
%w[
  generic none presentation group region section
].freeze
CONTENT_ROLES =
%w[
  heading paragraph text statictext image img
  listitem term definition
].freeze

Instance Method Summary collapse

Methods inherited from Base

#category, #description, #name, #parameters, #to_function_definition

Instance Method Details

#execute(action:, profile: nil, working_dir: nil, **opts) ⇒ Object



128
129
130
131
132
133
134
135
136
137
138
# File 'lib/clacky/tools/browser.rb', line 128

def execute(action:, profile: nil, working_dir: nil, **opts)
  bypass = action.to_s == "status" ||
           (action.to_s == "act" && (opts[:kind] || opts["kind"]).to_s == "evaluate")
  unless bypass
    return browser_not_setup_error unless File.exist?(BROWSER_CONFIG_PATH)
    return browser_disabled_error  unless browser_enabled?
  end
  execute_user_browser(action, opts)
rescue StandardError => e
  { error: classify_browser_error(e) }
end

#format_call(args) ⇒ Object



140
141
142
143
# File 'lib/clacky/tools/browser.rb', line 140

def format_call(args)
  action = args[:action] || args["action"] || "browser"
  "browser(#{action})"
end

#format_result(result) ⇒ Object



145
146
147
148
149
# File 'lib/clacky/tools/browser.rb', line 145

def format_result(result)
  return "[Error] #{result[:error].to_s[0..80]}" if result[:error]
  return "[OK] #{result[:output].to_s.lines.size} lines" if result[:output]
  "[OK] Done"
end

#format_result_for_llm(result) ⇒ Object



151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# File 'lib/clacky/tools/browser.rb', line 151

def format_result_for_llm(result)
  return result if result[:error]

  action = result[:action].to_s

  if action == "screenshot" && result[:image_data]
    mime_type       = result[:mime_type] || "image/png"
    image_data      = result[:image_data]
    data_url        = "data:#{mime_type};base64,#{image_data}"
    original_path   = result[:original_path]
    compressed_path = result[:compressed_path]

    text = "Screenshot captured."
    if original_path || compressed_path
      text += "\n- Original (full resolution): #{original_path || 'unavailable'}" \
              "\n- Compressed (800px, sent to AI): #{compressed_path || 'unavailable'}"
    end

    return [
      { type: "text",      text:      text },
      { type: "image_url", image_url: { url: data_url } }
    ]
  end

  output = result[:output].to_s
  output = compress_snapshot(output) if action == "snapshot"
  max_chars = action == "snapshot" ? MAX_SNAPSHOT_CHARS : MAX_LLM_OUTPUT_CHARS

  {
    action:  action,
    success: result[:success],
    stdout:  truncate_output(output, max_chars),
    profile: result[:profile]
  }.compact
end