ruby_llm-toolbox

A safe-by-default bundle of RubyLLM::Tool classes covering the skills common to most LLM harnesses — filesystem, shell, web, git, and structured-data tools — packaged as one gem with one require.

One gem, one require. require "ruby_llm/toolbox" loads everything. No sub-gems, no second require.
Safe by default. Read-only tools work out of the box. Mutating/exec tools are loaded but inert until you explicitly enable them.
Token-budgeted output. Every tool result is truncated (head + tail, middle elided) to fit a token budget, counted with ruby_llm-tokenizer — so a single grep can't blow up the context window.
Uniform failure contract. Tools never raise into the harness; failures come back as { error:, code: }, matching ruby_llm's own convention.

Status: v0.1 — ships the framework plus fifty tools across filesystem, search, code intelligence, git, web, the Ruby/Python/Rust toolchain, structured data (JSON/YAML/TOML/CSV), background process management, and small utilities. Safe tools are on by default; exec tools (writes, mutations, code execution) are gated behind enable_exec_tools. parse_ruby uses Prism (bundled with the supported Ruby 3.3+), with a Ripper fallback for non-MRI runtimes. An optional, operator-controlled unsafe override lets specific calls bypass individual guards when explicitly permitted. See Tools and the Roadmap. For an end-to-end walkthrough — wiring, the safety model, sandbox/search selection, the full tool catalog, and "reach for X, not Y" rules you can hand to the agent — read the Usage Guide.

Installation

Requires Ruby >= 3.3 (where Prism is bundled, so parse_ruby uses it with no extra dependency).

The tokenizer dependency (ruby_llm-tokenizer) pulls in the sentencepiece native gem, which requires the SentencePiece C library to be present at build time:

# Ubuntu / Debian
sudo apt-get install -y libsentencepiece-dev

# macOS (Homebrew — arm64 installs to /opt/homebrew, so point the build at it)
brew install sentencepiece
bundle config set build.sentencepiece \
  "--with-sentencepiece-dir=$(brew --prefix sentencepiece)"

Then add the gem to your Gemfile:

# Gemfile
gem "ruby_llm-toolbox"

Quick start

require "ruby_llm/toolbox"

RubyLLM::Toolbox.configure do |c|
  c.fs_root           = "/srv/project"   # filesystem tools are jailed to this
  c.max_output_tokens = 2_000            # per-result budget
  c.tokenizer_model   = "gpt-4o"         # which tokenizer to count with
end

chat = RubyLLM.chat
chat.with_tools(*RubyLLM::Toolbox.safe_tools)   # read-only set, always on
chat.ask("What does config/database.yml configure?")

Enabling exec tools

Dangerous tools (bash, and the upcoming write_file, edit_file, run_code, git_commit, mutating http_request) are loaded but refuse to run until you opt in:

RubyLLM::Toolbox.configure do |c|
  c.enable_exec_tools = true
  c.allowed_commands  = %w[ls cat grep rg]   # bash runs ONLY these executables
  c.command_timeout   = 30
end

chat.with_tools(*RubyLLM::Toolbox.all_tools)  # exec tools still honor the gate

You can also scope a single instance without touching global config:

chat.with_tool(RubyLLM::Toolbox::Tools::ReadFile.new(fs_root: "/srv/other"))

Tools

`read_file` (safe)

Reads a UTF-8 text file from within fs_root, with an optional 1-based line range or a tail of the last N lines (like tail -n N, which takes precedence over the range). Output is token-budgeted. Path traversal and symlink escapes are rejected.

`list_directory` (safe)

Lists directory entries within fs_root with type (dir/file/symlink) and size. Optional recursive and include_hidden. Symlinked directories are listed but not traversed, so a link can't walk out of the jail.

`tree` (safe)

Renders a depth-limited directory tree under fs_root (default 3 levels) — a fast way to grasp project structure without walking it one level at a time. Directories are marked with a trailing slash; ignored directories and hidden entries are skipped (toggle with show_hidden), symlinks aren't followed, and the listing is capped.

`glob` (safe)

Finds files matching a glob (**/*.rb, app/models/*.rb) within fs_root, relative to an optional base. Patterns containing .. are rejected and each hit is re-checked through the jail to drop symlink escapes.

`grep_files` (safe)

Searches file contents for a regex within fs_root, returning path:line: text. Optional file glob filter and ignore_case, plus before/after/context lines (like grep -B/-A/-C) — context lines render as path-line- text and separate blocks are divided with --. The pattern is compiled with a per-match timeout (ReDoS backstop), binary files and noisy/VCS directories are skipped, and results are capped.

`gem` (safe)

Read-only RubyGems.org metadata lookup. Actions: info (summary), version (latest), dependencies (runtime deps), search (find gems by query). The host is fixed and all input is URL-encoded, so there's no arbitrary-URL surface.

`parse_ruby` (safe)

In-process structural outline of a Ruby file (classes, modules, methods, constants with line numbers and nesting), or definition lookup by query/kind. It parses — never executes — the code, through one of two interchangeable backends behind RubyOutline: Prism when it can be loaded (it's bundled with Ruby 3.3+, the supported floor, so no gem install is needed), and Ripper (stdlib) as a fallback for runtimes that don't bundle Prism (e.g. non-MRI). The two are held to identical output by spec/ruby_outline_parity_spec.rb and bin/verify_prism_parity, which compares both backends over a corpus and can be run under any Ruby — including a sandboxed one (docker run --rm -v "$PWD":/app -w /app ruby:3.4-slim ruby bin/verify_prism_parity).

`json_query` / `yaml_query` / `toml_query` / `csv_read` (safe), `csv_write` (exec)

json_query, yaml_query, and toml_query parse JSON / YAML / TOML (from a file in fs_root or an inline string) and extract values with a shared dot/bracket path (users[0].name, dependencies.serde.version, products[].name) or pretty-print. YAML is loaded with safe_load (no arbitrary Ruby objects); TOML uses a dependency-free parser covering the common surface of TOML 1.0 (tables, arrays-of-tables, inline tables, dotted keys, all scalar forms). csv_read reads a CSV into readable rows (optional header, limit); csv_write writes an array of rows (optional headers) to a CSV.

`web_fetch` / `web_search` / `http_request` (safe)

web_fetch retrieves a URL over http/https and returns readable text (HTML stripped), following redirects. web_search queries the web through a swappable adapter — Tavily by default (set tavily_api_key), or set search_adapter to :brave (commercial Brave Search API, set brave_api_key), :searxng (a keyless, self-hosted SearXNG instance, set searxng_url), or any object responding to #search(query, max_results:). http_request is a general client returning status/headers/body. All three route through Safety::UrlGuard (see below). http_request allows GET/HEAD by default; POST/PUT/PATCH/DELETE require enable_exec_tools.

`download_file` (exec, gated)

Downloads a URL to a file within fs_root (whereas web_fetch returns text). Routes through Safety::UrlGuard, follows redirects safely, is capped at config.max_fetch_bytes, and jails the destination path.

`bash` (exec, gated)

Runs one allowlisted executable with arguments. Deliberately not a shell — no pipes, redirects, globs, quoting, or variable expansion. The program goes in command; each argument is a separate element of args, passed verbatim as argv. This is the primitive that the OS-command-injection bug class can't reach, because nothing ever parses the input as a shell line.

// model emits:
{ "command": "rg", "args": ["TODO", "app/models"] }

`run_ruby` (exec, gated)

Executes a Ruby snippet inside the active sandbox runtime with code piped on stdin. Under Docker it runs in an ephemeral, no-network, read-only, cap-dropped container; under bubblewrap or sandbox-exec it runs the host's ruby in an isolated, no-network, write-restricted environment. Requires enable_exec_tools and an available sandbox; returns a clean :sandbox_unavailable error otherwise.

`run_python` (exec, gated)

Same sandbox as run_ruby, running Python (the config.python_image under Docker, or the host's python3 under the host-process backends). Code is piped to python3 on stdin.

`python_tests` (exec, gated)

Runs the project's Python tests from fs_root — pytest by default, or unittest (python -m unittest discover) — with a parsed pass/fail headline, mirroring run_tests.

`run_rust` (exec, gated)

Compiles and runs a self-contained Rust program in the same sandbox (config.rust_image under Docker, or the host's rustc under the host-process backends). The source is piped on stdin; a shell step inside the sandbox writes it to scratch, compiles with rustc, and runs the binary, returning compiler output plus the program's stdout/stderr and exit.

`calculator` / `date_time` / `diff` / `todo_write` (safe)

Small in-process utilities. calculator evaluates an arithmetic expression with a real recursive-descent parser — never eval — supporting + - * / % **, parentheses, common functions (sqrt, sin, ln, …), and constants (pi, e). date_time returns the current time (or converts a unix timestamp), with an optional strftime format. diff produces a readable line-by-line comparison of two text blocks. todo_write maintains a task list across calls for multi-step work (pass the full list each time; statuses are pending/in_progress/completed).

Background processes: `process_start` / `process_output` / `process_list` / `process_kill`

Long-running commands — dev servers, file watchers, log tails — that an agent starts, polls, and stops without blocking on them.

process_start (exec, gated) launches one allowlisted executable as a background process and returns its id (e.g. proc_1) immediately. It carries the same safety model as bash: argv only (no shell), the minimal env_passthrough environment, run in fs_root, in its own process group with an address-space cap derived from sandbox_memory (but no CPU cap — these are meant to run indefinitely). The number of concurrent live processes is bounded by max_processes.

The other three are safe — they only act on processes already started, and process_kill is always available as a stop valve even if exec tools are later disabled. process_output returns the stdout/stderr produced since the last read (incremental, so polling in a loop streams output without repeats) plus the current status and exit code. process_list shows every process with its id, status, pid, age, and command. process_kill stops a process — SIGTERM to its group, escalating to SIGKILL, plus a /proc descendant sweep so children are reaped even where group-signal delivery is incomplete — then returns any final output and removes it from the registry. Output buffers are bounded (256 KB of unread data per stream; older bytes are dropped with a marker), so a chatty process can't exhaust memory. Everything still running is killed at interpreter exit so nothing is orphaned.

`write_file` (exec, gated)

Creates or overwrites a text file within fs_root, creating missing parent directories.

`edit_file` (exec, gated)

The core editing primitive: replace an exact substring. old_string must match exactly once (include surrounding context) unless replace_all is set; a missing or ambiguous match fails clearly instead of guessing. Backslash sequences in new_string are written literally — no accidental backreference interpretation.

`multi_edit` (exec, gated)

Applies several edit_file-style replacements to one file atomically. Edits run in order (a later edit sees earlier results), each following the exact-match-once rule unless replace_all is set. If any edit can't be applied, nothing is written and the failing edit is named — so the file is never left half-edited. Saves a round-trip per change when batching.

`replace_in_files` (exec, gated)

Project-wide find/replace across files matching a glob (default **/*). Literal by default, or regex: true with \1 backreferences in the replacement; ignore_case and dry_run are supported. Binary files and ignored_dirs are skipped, the pattern runs under a ReDoS timeout, and every path is jailed to fs_root.

`create_directory` / `move_file` / `delete_file` (exec, gated)

create_directory does mkdir -p within the jail. move_file renames/moves with both endpoints confined to fs_root and refuses to clobber unless overwrite. delete_file removes a file or empty directory; a non-empty directory needs recursive, and fs_root itself can't be deleted.

`git_status` / `git_diff` / `git_log` / `git_show` / `git_blame` / `git_grep` / `git_branch` (safe)

Read-only views of the repo at fs_root. git_diff takes optional staged, path, and ref; git_log takes count and path; git_show shows a commit or a file at a ref; git_blame shows line-by-line authorship (optional range); git_grep searches tracked content (optional path, ignore_case, fixed), passing the pattern via -e so a dash-leading pattern can't inject a git option; git_branch lists branches with the current one marked (optional all for remotes). Because git can be made to run repo-configured commands during read operations (core.fsmonitor on status, diff.external/textconv on diff/show), these are neutralized so a hostile checkout can't turn a diff into code execution. Refs are validated to block option injection, path arguments are jailed, and the pager and credential prompts are disabled so nothing hangs. Requires git on the host.

`git_add` / `git_commit` / `git_checkout` / `apply_patch` (exec, gated)

git_add/git_commit/git_checkout stage, commit, and switch branches. apply_patch applies a unified diff via git apply — validated with --check first (nothing is written if it wouldn't apply cleanly), with check: true for a dry run. Path-escaping patches are rejected. Does not push.

`run_tests` / `lint` / `bundle` (exec, gated)

The verify trio, run from fs_root. run_tests auto-detects RSpec (spec//.rspec) or Minitest (test/ via rake) and returns output with a pass/fail headline (a failing suite is a result, not a tool error). lint runs RuboCop (or Standard when .standard.yml is present), with optional autocorrect. bundle runs Bundler actions (install, update, outdated, check, lock, add). These inherit the full host environment (so bundler, rbenv/rvm, and the dev binaries resolve), use bundle exec when a Gemfile exists, and report :unavailable if the tool isn't installed.

Safety model

The dangerous surface is engineered, not just documented:

Concern	Mitigation
Path traversal / symlink escape	`Safety::PathJail` resolves realpath and confines to `fs_root`
OS command injection	`bash` uses array-form spawn (no shell) + executable allowlist
Env leakage	spawned processes get a scrubbed env (`env_passthrough` only)
Runaway processes	hard wall-clock `command_timeout`, then `SIGKILL`
Untrusted code execution	runs in a pluggable sandbox — Docker (no-network, read-only, cap-dropped) or host-process bubblewrap/sandbox-exec with no network, restricted writes, and rlimit caps
Malicious repo config (RCE)	git tools disable `core.fsmonitor`, external diff drivers, and textconv
Context blowup	every result passes through the token budgeter
ReDoS (user regex)	`grep_files` compiles patterns with a per-match `regex_timeout`
SSRF (web tools)	`Safety::UrlGuard` allows only http/https, blocks private/loopback/link-local/metadata IPs, pins the socket to the vetted IP (closing DNS rebinding), and re-checks every redirect hop
Privilege escalation by the agent	the unsafe override is opt-in per call and requires an operator-set `allow_unsafe`; an agent passing `unsafe: true` on its own gets `:unsafe_denied`

Security override

Sometimes an operator genuinely wants a tool to step outside its guard — read a file outside fs_root, run a non-allowlisted binary, fetch an internal URL. The override is built so the agent can ask but never grant:

A few tools (read_file, write_file, bash, web_fetch, http_request) take an unsafe: true parameter.
That alone does nothing. Unless a human has set RubyLLM::Toolbox.config.allow_unsafe = true, any call requesting it is refused with :unsafe_denied. The model cannot flip that switch.
When both line up, the call bypasses only its own guard (path jail, command allowlist, or SSRF check) — never the deeper invariants (e.g. bash is still argv-only with no shell, and still rejects NUL bytes). Set config.unsafe_logger = ->(tool, detail) { … } to audit every override that fires.

This keeps the default safe, makes escalation a deliberate operator decision, and leaves an audit trail — rather than a single boolean an agent could talk its way into.

Sandbox runtimes

The code-execution tools (run_ruby/run_python/run_rust) run through a pluggable sandbox, chosen by config.sandbox_runtime (default :auto):

Runtime	Platform	How it isolates
`:docker`	any with Docker	Ephemeral container: `--network none`, read-only root + tmpfs `/tmp`, `--cap-drop ALL`, no-new-privileges, non-root user, memory/CPU/pids limits. Only the image is visible — not the host.
`:bubblewrap`	Linux (`bwrap`)	Fresh namespaces via `--unshare-all` (no network), host filesystem bound read-only, writable tmpfs `/tmp`, `--die-with-parent`. Runs host interpreters.
`:sandbox_exec`	macOS	Seatbelt profile: deny-by-default, all network denied, reads allowed, writes only to temp. Runs host interpreters.
`:none`	—	Disables code execution (`:sandbox_unavailable`).

:auto prefers the native lightweight sandbox per platform (bubblewrap on Linux, sandbox-exec on macOS), falling back to Docker, then to :none. The host-process backends apply memory/CPU caps as inherited rlimits (since they don't use cgroups), and can be tuned with config.sandbox_bwrap_extra and config.sandbox_seatbelt_profile.

One tradeoff worth knowing: unlike Docker (which only exposes its image), the host-process backends leave the host filesystem readable (read-only) inside the sandbox. On a host with secrets the model shouldn't read, prefer Docker, or add masks via sandbox_bwrap_extra (e.g. ["--tmpfs", "/home"]).

Return contract

Success → a String (or a Hash for structured tools).
Failure → { error: "human-readable message", code: :symbol }. Never an exception.

Failure codes include :exec_disabled, :path_denied, :not_a_file, :too_large, :command_denied, :tool_exception.

Configuration reference

Option	Default	Purpose
`fs_root`	`Dir.pwd`	Jail root for filesystem tools
`enable_exec_tools`	`false`	Master switch for the dangerous set
`allowed_commands`	`[]`	Executables `bash` and `process_start` may run
`command_timeout`	`30`	Wall-clock limit (seconds) for spawned processes
`max_processes`	`8`	Maximum concurrent background processes (`process_start`)
`env_passthrough`	`%w[PATH LANG LC_ALL HOME]`	Env vars forwarded to subprocesses
`max_output_tokens`	`2000`	Per-result token budget
`tokenizer_model`	`"gpt-4o"`	Model id used to pick a tokenizer
`regex_timeout`	`2`	Per-match timeout (seconds) for `grep_files` patterns
`max_grep_matches`	`200`	Cap on grep matches per call
`search_adapter`	`nil`	Web search backend: `nil`/`:tavily`, `:brave`, `:searxng`, or a custom adapter object
`tavily_api_key`	`ENV["TAVILY_API_KEY"]`	API key for the default (Tavily) `web_search` adapter
`brave_api_key`	`ENV["BRAVE_API_KEY"]`	Subscription token for the `:brave` adapter
`searxng_url`	`ENV["SEARXNG_URL"]`	Base URL of a self-hosted SearXNG instance for the `:searxng` adapter
`web_allowlist` / `web_denylist`	`[]`	Domain allow/deny lists enforced by `UrlGuard`
`max_fetch_bytes` / `max_redirects`	`2_000_000` / `5`	`web_fetch`/`http_request` body cap and redirect limit
`docker_image` / `python_image` / `rust_image`	`"ruby:3.3-slim"` / `"python:3.12-slim"` / `"rust:1-slim"`	Images for `run_ruby` / `run_python` / `run_rust` (Docker runtime)
`sandbox_runtime`	`:auto`	`:auto`, `:docker`, `:bubblewrap`, `:sandbox_exec`, or `:none`
`sandbox_bwrap_extra`	`[]`	Extra bubblewrap args (e.g. `["--tmpfs", "/home"]`)
`sandbox_seatbelt_profile`	`nil`	Custom macOS Seatbelt SBPL profile (overrides the default)
`allow_unsafe`	`false`	Operator master switch enabling the per-call unsafe override
`unsafe_logger`	`nil`	Callable `->(tool_name, detail)` invoked whenever an override fires
`sandbox_network` / `sandbox_memory` / `sandbox_cpus` / `sandbox_pids`	`none` / `256m` / `1.0` / `128`	Container limits for `run_ruby`/`run_python`/`run_rust`
`http_timeout`	`10`	Open/read timeout (seconds) for the `gem`, `web_fetch`, `web_search`, and `http_request` tools

Counting Claude models: call RubyLLM::Tokenizer.enable_claude_approximation! once at boot, then set tokenizer_model to your Claude model id.

Roadmap

Locked decisions: single gem, tokenizer-based budgeting, Tavily as the default search provider (behind a swappable adapter — Brave / SearXNG drop in), Docker as the run_code sandbox backend.

Skeleton + pattern — base class, config, truncator, return contract, RSpec harness, read_file, bash. ✅
Filesystem read set — list_directory, glob, grep_files. ✅
Ruby tools — gem (RubyGems.org metadata, safe) and run_ruby (Docker sandbox, exec). ✅
Filesystem write set — write_file, edit_file, create_directory, move_file, delete_file (exec). ✅
Git — git_status/git_diff/git_log (safe), git_add/git_commit/git_checkout (exec). ✅
Verify loop — run_tests, lint, bundle (exec). ✅
Python — run_python (Docker sandbox) and python_tests (pytest/unittest), exec. ✅
Code intelligence — parse_ruby (Ripper outline/navigation, safe). ✅
Web — web_fetch, web_search (Tavily), http_request + Safety::UrlGuard SSRF protection. ✅
Patch, git history & data — apply_patch, git_show, git_blame, json_query, csv_read/csv_write. ✅
Utilities, Rust & hardening — calculator, date_time, diff, todo_write; run_rust; UrlGuard IP-pinning; operator-controlled unsafe override. ✅
Search, YAML & the Prism backend — git_grep; yaml_query (safe_load) sharing one path engine with json_query; parse_ruby now auto-selects Prism (Ruby 3.3+) with a Ripper fallback and a parity harness. ✅
CI & sandbox runtimes — GitHub Actions (rspec on Ruby 3.3/3.4 × Linux/macOS, parity harness, gem build); pluggable sandbox with bubblewrap (Linux) and sandbox-exec (macOS) backends alongside Docker, selected by sandbox_runtime. ✅
More tools — toml_query (dependency-free TOML parser, completing JSON/YAML/TOML/CSV); replace_in_files (project-wide find/replace); download_file (SSRF-guarded fetch to disk); git_branch. ✅
Editing & navigation ergonomics — multi_edit (atomic batched edits), tree (depth-limited overview); read_file already supports line ranges. ✅
Background processes — process_start (gated), process_output, process_list, process_kill: stateful long-running commands (dev servers, watchers, log tails) with incremental output, bounded buffers, a concurrency cap, and group + /proc-descendant cleanup. ✅
Search isn't single-vendor — two more web_search adapters behind the same seam: :brave (commercial Brave Search API, header-key auth) and :searxng (keyless, self-hosted), selected by search_adapter. ✅
Next — an ecosystem-docs PR against crmne/ruby_llm, and a toolbox-level usage guide (safe→exec model, unsafe override, sandbox + search selection).

Development

bundle install          # installs ruby_llm, ruby_llm-tokenizer, rspec
bundle exec rspec       # run the test suite
bundle exec rake build  # build the gem into pkg/
bundle exec rake install # build + install locally

# verify the parse_ruby backends agree (Prism vs Ripper)
ruby bin/verify_prism_parity

Requires Ruby >= 3.3. The Docker-backed tools (run_ruby/run_python/run_rust) need a Docker daemon to actually execute; without one they return a clean :sandbox_unavailable error, and their specs stub the sandbox.

License

MIT.

ruby_llm-toolbox

Installation

Quick start

Enabling exec tools

Tools

read_file (safe)

list_directory (safe)

tree (safe)

glob (safe)

grep_files (safe)

gem (safe)

parse_ruby (safe)

json_query / yaml_query / toml_query / csv_read (safe), csv_write (exec)

web_fetch / web_search / http_request (safe)

download_file (exec, gated)

bash (exec, gated)

run_ruby (exec, gated)

run_python (exec, gated)

python_tests (exec, gated)

run_rust (exec, gated)

calculator / date_time / diff / todo_write (safe)

Background processes: process_start / process_output / process_list / process_kill

write_file (exec, gated)

edit_file (exec, gated)

multi_edit (exec, gated)

replace_in_files (exec, gated)

create_directory / move_file / delete_file (exec, gated)

git_status / git_diff / git_log / git_show / git_blame / git_grep / git_branch (safe)

git_add / git_commit / git_checkout / apply_patch (exec, gated)

run_tests / lint / bundle (exec, gated)