Flunky
Flunky lets any AI agent drive a real browser. It borrows the layering of a browser automation library (a thin driver, a stateful session, a readable action DSL) and adds the two things an agent needs that a test framework does not: a way to show the page to a model, and a way to expose actions as tools the model can call.
The gem never calls an AI vendor itself. It emits tool schemas and a dispatcher; you inject the model client.
How it fits together
- Driver (
Drivers::Base,Drivers::FerrumDriver) talks to the browser. Ferrum drives Chrome over the DevTools Protocol with no Selenium server. The backend is swappable. - Snapshot reduces the live page to the elements an agent can act on, each
stamped with an integer
ref, and renders a compact prompt block. - Actions is the human-readable DSL over the driver (
click,type,fill_in, ...). - Session owns one driver, caches the latest snapshot, and exposes the actions.
- Tools turns a session into vendor-neutral tool schemas plus a dispatcher.
- Agent is an optional observe/decide/act loop around an injected model.
Installation
Flunky needs Ruby >= 3.0 and a local Chrome (Ferrum launches it).
bundle add flunky
or
gem install flunky
Usage
require "flunky"
Flunky.session do |s|
s.visit("https://example.com")
puts s.snapshot.to_prompt # the page as the model sees it
s.actions.click(1) # act on a stamped ref
end
refs only exist after a snapshot stamps them, so observe (or read
snapshot) before acting. After client-side navigation a ref can go stale; the
tool dispatcher re-observes after every action so the model always sees the
current page.
Tool calling
session = Flunky::Session.new
tools = Flunky::Tools.new(session)
tools.definitions # hand straight to an Anthropic style client
tools.dispatch("click", { ref: 1 }) # run a returned tool call
The tool schema shape (name / description / input_schema) matches
Anthropic's tool format. For OpenAI, wrap each definition as
{ type: "function", function: { **defn, parameters: defn[:input_schema] } }.
Agent loop
Flunky::Agent.new(session, model:) drives an observe/decide/act loop. model
must respond to call(messages:, tools:) and return
{ text:, tool_calls: [{ id:, name:, arguments: }] }. See
examples/anthropic_agent.rb for a roughly 50 line
adapter to Anthropic's /v1/messages endpoint.
Configuration
Flunky.configure do |c|
c.headless = true
c.default_timeout = 10
c.max_elements = 200
c.window_size = [1280, 800]
end
Per-session options passed to Session.new override the global configuration.
Development
After checking out the repo, run bin/setup to install dependencies, then
bundle exec rspec. Specs tagged :browser are skipped automatically when
Chrome is not installed, so the suite passes on a bare machine.
License
Available as open source under the terms of the MIT License.