Class: Clacky::Tools::Browser

Inherits:
Base
  • Object
show all
Defined in:
lib/clacky/tools/browser.rb

Overview

Browser tool — controls the user’s real Chromium-based browser (Chrome 146+) via the Chrome DevTools MCP server (chrome-devtools-mcp).

Architecture: uses the existing-session driver (Chrome MCP).

chrome-devtools-mcp --autoConnect --experimentalStructuredContent
    --experimental-page-id-routing

Communication: MCP stdio JSON-RPC 2.0 over a persistent (daemon) process. The MCP server process is started once, kept alive across all tool calls, and only restarted when the process dies unexpectedly.

pageId is intentionally NOT passed to most MCP calls — the MCP server maintains its own selected page state. Only focus/close actions need pageId. When the selected page has been closed, mcp_call automatically retries once.

Constant Summary collapse

MIN_CHROME_MAJOR =
146
MCP_HANDSHAKE_TIMEOUT =
10
MCP_CALL_TIMEOUT =
60
MIN_NODE_MAJOR =
20
MAX_SNAPSHOT_CHARS =
4000
MAX_LLM_OUTPUT_CHARS =
6000
BROWSER_CONFIG_PATH =
File.expand_path("~/.clacky/browser.yml").freeze
BROWSER_DIAGNOSIS_HINT =
<<~HINT.strip.freeze
  Inform the user and ask if they'd like to run a diagnosis.
  If yes, invoke the browser-setup skill with subcommand "doctor".
HINT
BROWSER_NOT_CONNECTED_HINT =

Cause 1+2: Chrome not running, or Remote Debugging disabled (MCP can’t distinguish them)

<<~HINT.strip.freeze
  Chrome is not reachable. Possible causes:
  1. Chrome is not running — ask the user to open Chrome.
  2. Remote Debugging is disabled — enable via chrome://inspect/#remote-debugging.
HINT
BROWSER_DAEMON_HINT =

Cause 3: MCP daemon crashed or failed to start

<<~HINT.strip.freeze
  The browser MCP daemon crashed or failed to start. It may recover automatically on the next action.
  If it keeps failing, ask the user to restart Clacky.
HINT
BROWSER_RESTART_HINT =

Cause 4: Chrome long-session unresponsiveness

<<~HINT.strip.freeze
  Chrome has become unresponsive. This often happens after Chrome has been running for a long time.
  Ask the user to restart Chrome, then retry the action.
HINT
SCREENSHOT_MAX_WIDTH =
800
SCREENSHOT_MAX_BASE64_BYTES =
150_000
INTERACTIVE_ROLES =

Snapshot rendering


%w[
  button link textbox checkbox radio select combobox
  menuitem option tab switch searchbox spinbutton
  slider menuitemcheckbox menuitemradio
].freeze
STRUCTURAL_ROLES =
%w[
  generic none presentation group region section
].freeze
CONTENT_ROLES =
%w[
  heading paragraph text statictext image img
  listitem term definition
].freeze

Instance Method Summary collapse

Methods inherited from Base

#category, #description, #name, #parameters, #to_function_definition

Instance Method Details

#execute(action:, profile: nil, working_dir: nil, **opts) ⇒ Object



80
81
82
83
84
85
86
87
88
89
90
# File 'lib/clacky/tools/browser.rb', line 80

def execute(action:, profile: nil, working_dir: nil, **opts)
  bypass = action.to_s == "status" ||
           (action.to_s == "act" && (opts[:kind] || opts["kind"]).to_s == "evaluate")
  unless bypass
    return browser_not_setup_error unless File.exist?(BROWSER_CONFIG_PATH)
    return browser_disabled_error  unless browser_enabled?
  end
  execute_user_browser(action, opts)
rescue StandardError => e
  { error: classify_browser_error(e) }
end

#format_call(args) ⇒ Object



92
93
94
95
# File 'lib/clacky/tools/browser.rb', line 92

def format_call(args)
  action = args[:action] || args["action"] || "browser"
  "browser(#{action})"
end

#format_result(result) ⇒ Object



97
98
99
100
101
# File 'lib/clacky/tools/browser.rb', line 97

def format_result(result)
  return "[Error] #{result[:error].to_s[0..80]}" if result[:error]
  return "[OK] #{result[:output].to_s.lines.size} lines" if result[:output]
  "[OK] Done"
end

#format_result_for_llm(result) ⇒ Object



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# File 'lib/clacky/tools/browser.rb', line 103

def format_result_for_llm(result)
  return result if result[:error]

  action = result[:action].to_s

  if action == "screenshot" && result[:image_data]
    mime_type       = result[:mime_type] || "image/png"
    image_data      = result[:image_data]
    data_url        = "data:#{mime_type};base64,#{image_data}"
    original_path   = result[:original_path]
    compressed_path = result[:compressed_path]

    text = "Screenshot captured."
    if original_path || compressed_path
      text += "\n- Original (full resolution): #{original_path || 'unavailable'}" \
              "\n- Compressed (800px, sent to AI): #{compressed_path || 'unavailable'}"
    end

    return [
      { type: "text",      text:      text },
      { type: "image_url", image_url: { url: data_url } }
    ]
  end

  output = result[:output].to_s
  output = compress_snapshot(output) if action == "snapshot"
  max_chars = action == "snapshot" ? MAX_SNAPSHOT_CHARS : MAX_LLM_OUTPUT_CHARS

  {
    action:  action,
    success: result[:success],
    stdout:  truncate_output(output, max_chars),
    profile: result[:profile]
  }.compact
end