browserctl

A persistent browser automation daemon and CLI, purpose-built for AI agents and developer workflows.

Unlike tools that restart the browser on every script run, browserctl keeps a named browser session alive — preserving cookies, localStorage, open tabs, and page state across discrete commands.

browserd &                                           # start the daemon (headless)
browserctl open login --url https://example.com/login
browserctl snap login                                # AI-friendly JSON snapshot with ref IDs
browserctl fill login --ref e1 --value me@example.com   # interact by ref
browserctl click login --ref e2
browserctl shutdown

browserctl capturing a login flow

_{Login flow captured with browserctl shot}

Why browserctl?

Most automation tools are stateless — every script spins up a fresh browser and tears it down. browserctl doesn't.

	browserctl	Playwright / Selenium
Session persists across commands	✓	✗ (per-script lifecycle)
Named page handles	✓	✗
AI-friendly DOM snapshot	✓	✗
Lightweight CLI interface	✓	✗
Full browser automation API	—	✓
Parallel multi-browser testing	—	✓

Use browserctl when you need a browser that stays alive and remembers state — for AI agents, iterative dev workflows, or lightweight smoke tests.

Use Playwright/Selenium when you need parallel test suites, multi-browser support, or a full programmatic API.

Requirements

Ruby >= 3.2
Chrome or Chromium installed and on PATH

Installation

gem install browserctl

Or in your Gemfile:

gem "browserctl"

Quick Start

1. Start the daemon

browserd           # headless (default)
browserd --headed  # visible browser window

2. Open a named page

browserctl open login --url https://app.example.com/login

3. Snapshot the page to discover refs

browserctl snap login              # AI-friendly JSON with ref IDs (default)
browserctl snap login --format html

4. Interact using refs or selectors

browserctl fill  login --ref e1 --value user@example.com
browserctl fill  login --ref e2 --value s3cr3t
browserctl click login --ref e3

# or using explicit CSS selectors
browserctl fill  login "input[name=email]"    user@example.com
browserctl click login "button[type=submit]"

5. Observe the result

browserctl snap login --diff       # only changed elements since last snap
browserctl shot login --out /tmp/after-login.png --full
browserctl url  login

6. Manage pages and daemon

browserctl pages
browserctl close login
browserctl ping
browserctl shutdown

All Commands

Browser commands (require `browserd` running)

Command	Description
`open <page> [--url URL]`	Open or focus a named page
`close <page>`	Close a named page
`pages`	List open pages
`goto <page> <url>`	Navigate a page to a URL
`fill <page> <selector> <value>`	Fill an input field by CSS selector
`fill <page> --ref <id> --value <v>`	Fill an input field by snapshot ref
`click <page> <selector>`	Click an element by CSS selector
`click <page> --ref <id>`	Click an element by snapshot ref
`snap [--format ai\	html] [--diff]`
`watch <page> <selector> [--timeout N]`	Poll until selector appears (default timeout: 30s)
`shot <page> [--out PATH] [--full]`	Take a screenshot
`url <page>`	Print current URL
`eval <page> <expression>`	Evaluate a JS expression
`pause <page>`	Pause automation — browser stays live for manual interaction
`resume <page>`	Resume automation after manual action
`inspect <page>`	Open Chrome DevTools for a named page
`cookies <page>`	List all cookies as JSON
`set_cookie <page> <name> <value> <domain>`	Set a cookie (path defaults to `/`)
`clear_cookies <page>`	Clear all cookies for a page
`record start <name>`	Begin recording commands as a replayable workflow
`record stop [--out path]`	End recording; saves to `.browserctl/workflows/` or custom path
`record status`	Show whether a recording is active

Daemon commands

Command	Description
`ping`	Check if `browserd` is alive
`shutdown`	Stop `browserd`

Workflow commands

Command	Description
`run <name\	file.rb> [--key value ...]`
`workflows`	List available workflows
`describe <name>`	Show workflow params and steps

AI Snapshot Format

browserctl snap <page> returns a compact JSON array of interactable elements — designed to be token-efficient for AI agents:

[
  {
    "ref": "e1",
    "tag": "input",
    "text": "",
    "selector": "form > input[name=email]",
    "attrs": {
      "type": "email",
      "name": "email",
      "placeholder": "Enter email"
    }
  },
  {
    "ref": "e2",
    "tag": "button",
    "text": "Sign in",
    "selector": "form > button",
    "attrs": {
      "type": "submit"
    }
  }
]

Use ref values directly with --ref for zero-fragility interactions, or use selector values with fill and click.

Ref-based interaction

After a snap, use ref IDs instead of CSS selectors — no selector knowledge required:

browserctl fill  login --ref e1 --value user@example.com
browserctl click login --ref e2

Diff snapshots

Track only what changed since the last snapshot — useful for AI agents monitoring async updates:

browserctl snap login --diff

Workflows

Workflows are Ruby files using the Browserctl.workflow DSL. Place them in any of:

./.browserctl/workflows/
~/.browserctl/workflows/

Example

# .browserctl/workflows/smoke_login.rb
Browserctl.workflow "smoke_login" do
  desc "Log in and confirm the dashboard loads"

  param :email,    required: true
  param :password, required: true, secret: true
  param :base_url, default: "https://app.example.com"

  step "open login page" do
    page(:login).goto("#{base_url}/login")
  end

  step "submit credentials" do
    page(:login).fill("input[name=email]",    email)
    page(:login).fill("input[name=password]", password)
    page(:login).click("button[type=submit]")
  end

  step "verify dashboard" do
    page(:login).wait_for("[data-test=dashboard]", timeout: 10)
    assert page(:login).url.include?("/dashboard")
  end
end

browserctl run smoke_login --email me@example.com --password s3cr3t

Workflow DSL reference

Method	Description
`desc "text"`	Human-readable description
`param :name, required:, secret:, default:`	Declare a parameter
`step "label" { }`	Add a step (runs in order, halts on failure)
`step "label", retry_count: N, timeout: S { }`	Step with retry and/or timeout
`page(:name)`	Returns a `PageProxy` for the named page
`invoke "other_workflow", **overrides`	Call another workflow
`assert condition, "message"`	Raise `WorkflowError` if condition is false

PageProxy methods

goto(url) · fill(selector, value) · click(selector) · snapshot(**opts) · screenshot(**opts) · wait_for(selector, timeout: 10) · url · evaluate(expression) · pause · resume · inspect_page · cookies · set_cookie(name, value, domain, path: "/") · clear_cookies

Examples

Ready-to-run smoke tests against the-internet.herokuapp.com are included in examples/the_internet/. See docs/smoke-testing-the-internet.md for annotated output and auto-generated screenshots of each scenario.

For a full guide on building your own workflows, see docs/writing-workflows.md.

How it works

browserd runs as a background process, listening on a Unix socket at ~/.browserctl/browserd.sock. Start multiple named instances for agent isolation:

browserd --name agent-a &
browserd --name agent-b &
browserctl --daemon agent-a open main --url https://app.example.com

It manages a Ferrum (Chrome DevTools Protocol) browser instance with named page handles.

browserctl sends JSON-RPC commands over the socket and prints the result. Workflows run in-process through the same client.

The daemon shuts itself down after 30 minutes of inactivity.

Development

git clone https://github.com/patrick204nqh/browserctl
cd browserctl
bin/setup              # install deps + check for Chrome

bundle exec rspec      # run tests
bundle exec rubocop    # lint

Contributing

See CONTRIBUTING.md · SECURITY.md

License

MIT