ruby_llm-toolbox
A safe-by-default bundle of RubyLLM::Tool classes
covering the skills common to most LLM harnesses — filesystem, shell, web, git, and
structured-data tools — packaged as one gem with one require.
- One gem, one require.
require "ruby_llm/toolbox"loads everything. No sub-gems, no second require. - Safe by default. Read-only tools work out of the box. Mutating/exec tools are loaded but inert until you explicitly enable them.
- Token-budgeted output. Every tool result is truncated (head + tail, middle elided) to fit a token budget, counted with
ruby_llm-tokenizer— so a singlegrepcan't blow up the context window. - Uniform failure contract. Tools never raise into the harness; failures come back as
{ error:, code: }, matching ruby_llm's own convention.
Status: v0.1 — ships the framework plus fifty tools across filesystem, search, code intelligence, git, web, the Ruby/Python/Rust toolchain, structured data (JSON/YAML/TOML/CSV), background process management, and small utilities. Safe tools are on by default; exec tools (writes, mutations, code execution) are gated behind
enable_exec_tools.parse_rubyuses Prism (bundled with the supported Ruby 3.3+), with a Ripper fallback for non-MRI runtimes. An optional, operator-controlled unsafe override lets specific calls bypass individual guards when explicitly permitted. See Tools and the Roadmap. For an end-to-end walkthrough — wiring, the safety model, sandbox/search selection, the full tool catalog, and "reach for X, not Y" rules you can hand to the agent — read the Usage Guide.
Installation
Requires Ruby >= 3.3 (where Prism is bundled, so parse_ruby uses it with no extra
dependency).
The tokenizer dependency (ruby_llm-tokenizer) pulls in the sentencepiece native gem,
which requires the SentencePiece C library to be present at build time:
# Ubuntu / Debian
sudo apt-get install -y libsentencepiece-dev
# macOS (Homebrew — arm64 installs to /opt/homebrew, so point the build at it)
brew install sentencepiece
bundle config set build.sentencepiece \
"--with-sentencepiece-dir=$(brew --prefix sentencepiece)"
Then add the gem to your Gemfile:
# Gemfile
gem "ruby_llm-toolbox"
Quick start
require "ruby_llm/toolbox"
RubyLLM::Toolbox.configure do |c|
c.fs_root = "/srv/project" # filesystem tools are jailed to this
c.max_output_tokens = 2_000 # per-result budget
c.tokenizer_model = "gpt-4o" # which tokenizer to count with
end
chat = RubyLLM.chat
chat.with_tools(*RubyLLM::Toolbox.safe_tools) # read-only set, always on
chat.ask("What does config/database.yml configure?")
Enabling exec tools
Dangerous tools (bash, and the upcoming write_file, edit_file, run_code,
git_commit, mutating http_request) are loaded but refuse to run until you opt in:
RubyLLM::Toolbox.configure do |c|
c.enable_exec_tools = true
c.allowed_commands = %w[ls cat grep rg] # bash runs ONLY these executables
c.command_timeout = 30
end
chat.with_tools(*RubyLLM::Toolbox.all_tools) # exec tools still honor the gate
You can also scope a single instance without touching global config:
chat.with_tool(RubyLLM::Toolbox::Tools::ReadFile.new(fs_root: "/srv/other"))
Tools
read_file (safe)
Reads a UTF-8 text file from within fs_root, with an optional 1-based line range or a tail
of the last N lines (like tail -n N, which takes precedence over the range).
Output is token-budgeted. Path traversal and symlink escapes are rejected.
list_directory (safe)
Lists directory entries within fs_root with type (dir/file/symlink) and size.
Optional recursive and include_hidden. Symlinked directories are listed but not
traversed, so a link can't walk out of the jail.
tree (safe)
Renders a depth-limited directory tree under fs_root (default 3 levels) — a fast way to
grasp project structure without walking it one level at a time. Directories are marked with a
trailing slash; ignored directories and hidden entries are skipped (toggle with show_hidden),
symlinks aren't followed, and the listing is capped.
glob (safe)
Finds files matching a glob (**/*.rb, app/models/*.rb) within fs_root, relative
to an optional base. Patterns containing .. are rejected and each hit is re-checked
through the jail to drop symlink escapes.
grep_files (safe)
Searches file contents for a regex within fs_root, returning path:line: text. Optional
file glob filter and ignore_case, plus before/after/context lines (like grep
-B/-A/-C) — context lines render as path-line- text and separate blocks are divided
with --. The pattern is compiled with a per-match timeout (ReDoS backstop), binary files and
noisy/VCS directories are skipped, and results are capped.
gem (safe)
Read-only RubyGems.org metadata lookup. Actions: info (summary), version (latest),
dependencies (runtime deps), search (find gems by query). The host is fixed and all
input is URL-encoded, so there's no arbitrary-URL surface.
parse_ruby (safe)
In-process structural outline of a Ruby file (classes, modules, methods, constants with
line numbers and nesting), or definition lookup by query/kind. It parses — never executes
— the code, through one of two interchangeable backends behind RubyOutline: Prism when
it can be loaded (it's bundled with Ruby 3.3+, the supported floor, so no gem install is
needed), and Ripper (stdlib) as a fallback for runtimes that don't bundle Prism (e.g.
non-MRI). The two are held to identical output by
spec/ruby_outline_parity_spec.rb and bin/verify_prism_parity, which compares both
backends over a corpus and can be run under any Ruby — including a sandboxed one
(docker run --rm -v "$PWD":/app -w /app ruby:3.4-slim ruby bin/verify_prism_parity).
json_query / yaml_query / toml_query / csv_read (safe), csv_write (exec)
json_query, yaml_query, and toml_query parse JSON / YAML / TOML (from a file in
fs_root or an inline string) and extract values with a shared dot/bracket path
(users[0].name, dependencies.serde.version, products[].name) or pretty-print. YAML is
loaded with safe_load (no arbitrary Ruby objects); TOML uses a dependency-free parser
covering the common surface of TOML 1.0 (tables, arrays-of-tables, inline tables, dotted
keys, all scalar forms). csv_read reads a CSV into readable rows (optional header, limit);
csv_write writes an array of rows (optional headers) to a CSV.
web_fetch / web_search / http_request (safe)
web_fetch retrieves a URL over http/https and returns readable text (HTML stripped),
following redirects. web_search queries the web through a swappable adapter — Tavily by default
(set tavily_api_key), or set search_adapter to :brave (commercial Brave Search API,
set brave_api_key), :searxng (a keyless, self-hosted SearXNG instance, set searxng_url),
or any object responding to #search(query, max_results:). http_request is a general
client returning status/headers/body.
All three route through Safety::UrlGuard (see below). http_request allows GET/HEAD by
default; POST/PUT/PATCH/DELETE require enable_exec_tools.
download_file (exec, gated)
Downloads a URL to a file within fs_root (whereas web_fetch returns text). Routes through
Safety::UrlGuard, follows redirects safely, is capped at config.max_fetch_bytes, and jails
the destination path.
bash (exec, gated)
Runs one allowlisted executable with arguments. Deliberately not a shell — no
pipes, redirects, globs, quoting, or variable expansion. The program goes in command;
each argument is a separate element of args, passed verbatim as argv. This is the
primitive that the OS-command-injection bug class can't reach, because nothing ever
parses the input as a shell line.
// model emits:
{ "command": "rg", "args": ["TODO", "app/models"] }
run_ruby (exec, gated)
Executes a Ruby snippet inside the active sandbox runtime with code piped
on stdin. Under Docker it runs in an ephemeral, no-network, read-only, cap-dropped container;
under bubblewrap or sandbox-exec it runs the host's ruby in an isolated, no-network,
write-restricted environment. Requires enable_exec_tools and an available sandbox; returns a
clean :sandbox_unavailable error otherwise.
run_python (exec, gated)
Same sandbox as run_ruby, running Python (the config.python_image under Docker, or the
host's python3 under the host-process backends). Code is piped to python3 on stdin.
python_tests (exec, gated)
Runs the project's Python tests from fs_root — pytest by default, or unittest
(python -m unittest discover) — with a parsed pass/fail headline, mirroring run_tests.
run_rust (exec, gated)
Compiles and runs a self-contained Rust program in the same sandbox (config.rust_image under
Docker, or the host's rustc under the host-process backends). The source is piped on stdin; a
shell step inside the sandbox writes it to scratch, compiles with rustc, and runs the binary,
returning compiler output plus the program's stdout/stderr and exit.
calculator / date_time / diff / todo_write (safe)
Small in-process utilities. calculator evaluates an arithmetic expression with a real
recursive-descent parser — never eval — supporting + - * / % **, parentheses, common
functions (sqrt, sin, ln, …), and constants (pi, e). date_time returns the
current time (or converts a unix timestamp), with an optional strftime format. diff
produces a readable line-by-line comparison of two text blocks. todo_write maintains a
task list across calls for multi-step work (pass the full list each time; statuses are
pending/in_progress/completed).
Background processes: process_start / process_output / process_list / process_kill
Long-running commands — dev servers, file watchers, log tails — that an agent starts, polls, and stops without blocking on them.
process_start (exec, gated) launches one allowlisted executable as a
background process and returns its id (e.g. proc_1) immediately. It carries the
same safety model as bash: argv only (no shell), the minimal env_passthrough
environment, run in fs_root, in its own process group with an address-space cap
derived from sandbox_memory (but no CPU cap — these are meant to run
indefinitely). The number of concurrent live processes is bounded by
max_processes.
The other three are safe — they only act on processes already started, and
process_kill is always available as a stop valve even if exec tools are later
disabled. process_output returns the stdout/stderr produced since the last read
(incremental, so polling in a loop streams output without repeats) plus the
current status and exit code. process_list shows every process with its id,
status, pid, age, and command. process_kill stops a process — SIGTERM to its
group, escalating to SIGKILL, plus a /proc descendant sweep so children are
reaped even where group-signal delivery is incomplete — then returns any final
output and removes it from the registry. Output buffers are bounded (256 KB of
unread data per stream; older bytes are dropped with a marker), so a chatty
process can't exhaust memory. Everything still running is killed at interpreter
exit so nothing is orphaned.
write_file (exec, gated)
Creates or overwrites a text file within fs_root, creating missing parent directories.
edit_file (exec, gated)
The core editing primitive: replace an exact substring. old_string must match exactly
once (include surrounding context) unless replace_all is set; a missing or ambiguous
match fails clearly instead of guessing. Backslash sequences in new_string are written
literally — no accidental backreference interpretation.
multi_edit (exec, gated)
Applies several edit_file-style replacements to one file atomically. Edits run in order
(a later edit sees earlier results), each following the exact-match-once rule unless
replace_all is set. If any edit can't be applied, nothing is written and the failing edit is
named — so the file is never left half-edited. Saves a round-trip per change when batching.
replace_in_files (exec, gated)
Project-wide find/replace across files matching a glob (default **/*). Literal by default,
or regex: true with \1 backreferences in the replacement; ignore_case and dry_run
are supported. Binary files and ignored_dirs are skipped, the pattern runs under a ReDoS
timeout, and every path is jailed to fs_root.
create_directory / move_file / delete_file (exec, gated)
create_directory does mkdir -p within the jail. move_file renames/moves with both
endpoints confined to fs_root and refuses to clobber unless overwrite. delete_file
removes a file or empty directory; a non-empty directory needs recursive, and fs_root
itself can't be deleted.
git_status / git_diff / git_log / git_show / git_blame / git_grep / git_branch (safe)
Read-only views of the repo at fs_root. git_diff takes optional staged, path, and
ref; git_log takes count and path; git_show shows a commit or a file at a ref;
git_blame shows line-by-line authorship (optional range); git_grep searches tracked
content (optional path, ignore_case, fixed), passing the pattern via -e so a
dash-leading pattern can't inject a git option; git_branch lists branches with the current
one marked (optional all for remotes). Because git can be made to run repo-configured
commands during read operations (core.fsmonitor on status, diff.external/textconv on
diff/show), these are neutralized so a hostile checkout can't turn a diff into code execution.
Refs are validated to block option injection, path arguments are jailed, and the pager and
credential prompts are disabled so nothing hangs. Requires git on the host.
git_add / git_commit / git_checkout / apply_patch (exec, gated)
git_add/git_commit/git_checkout stage, commit, and switch branches. apply_patch
applies a unified diff via git apply — validated with --check first (nothing is written
if it wouldn't apply cleanly), with check: true for a dry run. Path-escaping patches are
rejected. Does not push.
run_tests / lint / bundle (exec, gated)
The verify trio, run from fs_root. run_tests auto-detects RSpec (spec//.rspec) or
Minitest (test/ via rake) and returns output with a pass/fail headline (a failing suite is
a result, not a tool error). lint runs RuboCop (or Standard when .standard.yml is
present), with optional autocorrect. bundle runs Bundler actions (install, update,
outdated, check, lock, add). These inherit the full host environment (so bundler,
rbenv/rvm, and the dev binaries resolve), use bundle exec when a Gemfile exists, and report
:unavailable if the tool isn't installed.
Safety model
The dangerous surface is engineered, not just documented:
| Concern | Mitigation |
|---|---|
| Path traversal / symlink escape | Safety::PathJail resolves realpath and confines to fs_root |
| OS command injection | bash uses array-form spawn (no shell) + executable allowlist |
| Env leakage | spawned processes get a scrubbed env (env_passthrough only) |
| Runaway processes | hard wall-clock command_timeout, then SIGKILL |
| Untrusted code execution | runs in a pluggable sandbox — Docker (no-network, read-only, cap-dropped) or host-process bubblewrap/sandbox-exec with no network, restricted writes, and rlimit caps |
| Malicious repo config (RCE) | git tools disable core.fsmonitor, external diff drivers, and textconv |
| Context blowup | every result passes through the token budgeter |
| ReDoS (user regex) | grep_files compiles patterns with a per-match regex_timeout |
| SSRF (web tools) | Safety::UrlGuard allows only http/https, blocks private/loopback/link-local/metadata IPs, pins the socket to the vetted IP (closing DNS rebinding), and re-checks every redirect hop |
| Privilege escalation by the agent | the unsafe override is opt-in per call and requires an operator-set allow_unsafe; an agent passing unsafe: true on its own gets :unsafe_denied |
Security override
Sometimes an operator genuinely wants a tool to step outside its guard — read a file outside
fs_root, run a non-allowlisted binary, fetch an internal URL. The override is built so the
agent can ask but never grant:
- A few tools (
read_file,write_file,bash,web_fetch,http_request) take anunsafe: trueparameter. - That alone does nothing. Unless a human has set
RubyLLM::Toolbox.config.allow_unsafe = true, any call requesting it is refused with:unsafe_denied. The model cannot flip that switch. - When both line up, the call bypasses only its own guard (path jail, command allowlist, or
SSRF check) — never the deeper invariants (e.g.
bashis still argv-only with no shell, and still rejects NUL bytes). Setconfig.unsafe_logger = ->(tool, detail) { … }to audit every override that fires.
This keeps the default safe, makes escalation a deliberate operator decision, and leaves an audit trail — rather than a single boolean an agent could talk its way into.
Sandbox runtimes
The code-execution tools (run_ruby/run_python/run_rust) run through a pluggable sandbox,
chosen by config.sandbox_runtime (default :auto):
| Runtime | Platform | How it isolates |
|---|---|---|
:docker |
any with Docker | Ephemeral container: --network none, read-only root + tmpfs /tmp, --cap-drop ALL, no-new-privileges, non-root user, memory/CPU/pids limits. Only the image is visible — not the host. |
:bubblewrap |
Linux (bwrap) |
Fresh namespaces via --unshare-all (no network), host filesystem bound read-only, writable tmpfs /tmp, --die-with-parent. Runs host interpreters. |
:sandbox_exec |
macOS | Seatbelt profile: deny-by-default, all network denied, reads allowed, writes only to temp. Runs host interpreters. |
:none |
— | Disables code execution (:sandbox_unavailable). |
:auto prefers the native lightweight sandbox per platform (bubblewrap on Linux, sandbox-exec
on macOS), falling back to Docker, then to :none. The host-process backends apply
memory/CPU caps as inherited rlimits (since they don't use cgroups), and can be tuned with
config.sandbox_bwrap_extra and config.sandbox_seatbelt_profile.
One tradeoff worth knowing: unlike Docker (which only exposes its image), the host-process
backends leave the host filesystem readable (read-only) inside the sandbox. On a host with
secrets the model shouldn't read, prefer Docker, or add masks via sandbox_bwrap_extra
(e.g. ["--tmpfs", "/home"]).
Return contract
- Success → a
String(or aHashfor structured tools). - Failure →
{ error: "human-readable message", code: :symbol }. Never an exception.
Failure codes include :exec_disabled, :path_denied, :not_a_file, :too_large,
:command_denied, :tool_exception.
Configuration reference
| Option | Default | Purpose |
|---|---|---|
fs_root |
Dir.pwd |
Jail root for filesystem tools |
enable_exec_tools |
false |
Master switch for the dangerous set |
allowed_commands |
[] |
Executables bash and process_start may run |
command_timeout |
30 |
Wall-clock limit (seconds) for spawned processes |
max_processes |
8 |
Maximum concurrent background processes (process_start) |
env_passthrough |
%w[PATH LANG LC_ALL HOME] |
Env vars forwarded to subprocesses |
max_output_tokens |
2000 |
Per-result token budget |
tokenizer_model |
"gpt-4o" |
Model id used to pick a tokenizer |
regex_timeout |
2 |
Per-match timeout (seconds) for grep_files patterns |
max_grep_matches |
200 |
Cap on grep matches per call |
search_adapter |
nil |
Web search backend: nil/:tavily, :brave, :searxng, or a custom adapter object |
tavily_api_key |
ENV["TAVILY_API_KEY"] |
API key for the default (Tavily) web_search adapter |
brave_api_key |
ENV["BRAVE_API_KEY"] |
Subscription token for the :brave adapter |
searxng_url |
ENV["SEARXNG_URL"] |
Base URL of a self-hosted SearXNG instance for the :searxng adapter |
web_allowlist / web_denylist |
[] |
Domain allow/deny lists enforced by UrlGuard |
max_fetch_bytes / max_redirects |
2_000_000 / 5 |
web_fetch/http_request body cap and redirect limit |
docker_image / python_image / rust_image |
"ruby:3.3-slim" / "python:3.12-slim" / "rust:1-slim" |
Images for run_ruby / run_python / run_rust (Docker runtime) |
sandbox_runtime |
:auto |
:auto, :docker, :bubblewrap, :sandbox_exec, or :none |
sandbox_bwrap_extra |
[] |
Extra bubblewrap args (e.g. ["--tmpfs", "/home"]) |
sandbox_seatbelt_profile |
nil |
Custom macOS Seatbelt SBPL profile (overrides the default) |
allow_unsafe |
false |
Operator master switch enabling the per-call unsafe override |
unsafe_logger |
nil |
Callable ->(tool_name, detail) invoked whenever an override fires |
sandbox_network / sandbox_memory / sandbox_cpus / sandbox_pids |
none / 256m / 1.0 / 128 |
Container limits for run_ruby/run_python/run_rust |
http_timeout |
10 |
Open/read timeout (seconds) for the gem, web_fetch, web_search, and http_request tools |
Counting Claude models: call
RubyLLM::Tokenizer.enable_claude_approximation!once at boot, then settokenizer_modelto your Claude model id.
Roadmap
Locked decisions: single gem, tokenizer-based budgeting, Tavily as the default search
provider (behind a swappable adapter — Brave / SearXNG drop in), Docker as the
run_code sandbox backend.
- Skeleton + pattern — base class, config, truncator, return contract, RSpec harness,
read_file,bash. ✅ - Filesystem read set —
list_directory,glob,grep_files. ✅ - Ruby tools —
gem(RubyGems.org metadata, safe) andrun_ruby(Docker sandbox, exec). ✅ - Filesystem write set —
write_file,edit_file,create_directory,move_file,delete_file(exec). ✅ - Git —
git_status/git_diff/git_log(safe),git_add/git_commit/git_checkout(exec). ✅ - Verify loop —
run_tests,lint,bundle(exec). ✅ - Python —
run_python(Docker sandbox) andpython_tests(pytest/unittest), exec. ✅ - Code intelligence —
parse_ruby(Ripper outline/navigation, safe). ✅ - Web —
web_fetch,web_search(Tavily),http_request+Safety::UrlGuardSSRF protection. ✅ - Patch, git history & data —
apply_patch,git_show,git_blame,json_query,csv_read/csv_write. ✅ - Utilities, Rust & hardening —
calculator,date_time,diff,todo_write;run_rust; UrlGuard IP-pinning; operator-controlled unsafe override. ✅ - Search, YAML & the Prism backend —
git_grep;yaml_query(safe_load) sharing one path engine withjson_query;parse_rubynow auto-selects Prism (Ruby 3.3+) with a Ripper fallback and a parity harness. ✅ - CI & sandbox runtimes — GitHub Actions (rspec on Ruby 3.3/3.4 × Linux/macOS, parity harness, gem build); pluggable sandbox with bubblewrap (Linux) and sandbox-exec (macOS) backends alongside Docker, selected by
sandbox_runtime. ✅ - More tools —
toml_query(dependency-free TOML parser, completing JSON/YAML/TOML/CSV);replace_in_files(project-wide find/replace);download_file(SSRF-guarded fetch to disk);git_branch. ✅ - Editing & navigation ergonomics —
multi_edit(atomic batched edits),tree(depth-limited overview);read_filealready supports line ranges. ✅ - Background processes —
process_start(gated),process_output,process_list,process_kill: stateful long-running commands (dev servers, watchers, log tails) with incremental output, bounded buffers, a concurrency cap, and group +/proc-descendant cleanup. ✅ - Search isn't single-vendor — two more
web_searchadapters behind the same seam::brave(commercial Brave Search API, header-key auth) and:searxng(keyless, self-hosted), selected bysearch_adapter. ✅ - Next — an ecosystem-docs PR against
crmne/ruby_llm, and a toolbox-level usage guide (safe→exec model, unsafe override, sandbox + search selection).
Development
bundle install # installs ruby_llm, ruby_llm-tokenizer, rspec
bundle exec rspec # run the test suite
bundle exec rake build # build the gem into pkg/
bundle exec rake install # build + install locally
# verify the parse_ruby backends agree (Prism vs Ripper)
ruby bin/verify_prism_parity
Requires Ruby >= 3.3. The Docker-backed tools (run_ruby/run_python/run_rust)
need a Docker daemon to actually execute; without one they return a clean
:sandbox_unavailable error, and their specs stub the sandbox.
License
MIT.