Flunky

Flunky lets any AI agent drive a real browser. It borrows the layering of a browser automation library (a thin driver, a stateful session, a readable action DSL) and adds the two things an agent needs that a test framework does not: a way to show the page to a model, and a way to expose actions as tools the model can call.

The gem never calls an AI vendor itself. It emits tool schemas and a dispatcher; you inject the model client.

How it fits together

  • Driver (Drivers::Base, Drivers::FerrumDriver) talks to the browser. Ferrum drives Chrome over the DevTools Protocol with no Selenium server. The backend is swappable.
  • Snapshot reduces the live page to the elements an agent can act on, each stamped with an integer ref, and renders a compact prompt block.
  • Actions is the human-readable DSL over the driver (click, type, fill_in, ...).
  • Session owns one driver, caches the latest snapshot, and exposes the actions.
  • Tools turns a session into vendor-neutral tool schemas plus a dispatcher.
  • Agent is an optional observe/decide/act loop around an injected model.

Installation

Flunky needs Ruby >= 3.0 and a local Chrome (Ferrum launches it).

bundle add flunky

or

gem install flunky

Usage

require "flunky"

Flunky.session do |s|
  s.visit("https://example.com")
  puts s.snapshot.to_prompt   # the page as the model sees it
  s.actions.click(1)          # act on a stamped ref
end

refs only exist after a snapshot stamps them, so observe (or read snapshot) before acting. After client-side navigation a ref can go stale; the tool dispatcher re-observes after every action so the model always sees the current page.

Tool calling

session = Flunky::Session.new
tools = Flunky::Tools.new(session)

tools.definitions          # hand straight to an Anthropic style client
tools.dispatch("click", { ref: 1 })  # run a returned tool call

The tool schema shape (name / description / input_schema) matches Anthropic's tool format. For OpenAI, wrap each definition as { type: "function", function: { **defn, parameters: defn[:input_schema] } }.

Agent loop

Flunky::Agent.new(session, model:) drives an observe/decide/act loop. model must respond to call(messages:, tools:) and return { text:, tool_calls: [{ id:, name:, arguments: }] }. See examples/anthropic_agent.rb for a roughly 50 line adapter to Anthropic's /v1/messages endpoint.

Configuration

Flunky.configure do |c|
  c.headless = true
  c.default_timeout = 10
  c.max_elements = 200
  c.window_size = [1280, 800]
end

Per-session options passed to Session.new override the global configuration.

Development

After checking out the repo, run bin/setup to install dependencies, then bundle exec rspec. Specs tagged :browser are skipped automatically when Chrome is not installed, so the suite passes on a bare machine.

License

Available as open source under the terms of the MIT License.