ollama_agent
Version: 1.0.0
Ruby gem that runs a CLI coding agent against a local Ollama model. It exposes tools to list files, read files, search the tree (ripgrep or grep), and apply unified diffs so the model can make small, reviewable edits.
Contents
- Features
- Kernel runtime (deterministic execution) — see also CAPABILITIES, CLI, OPERATIONS, USAGE
- Requirements
- Security and sandbox
- Installation
- Usage
- Skills
- Troubleshooting
- How it works
- Development
- License
Features
- Tool
list_files– list project files. - Tool
read_file– read file contents. - Tool
search_code– search code with ripgrep or grep. - Tool
edit_file– apply unified diffs safely. - Tool
list_directory_contents– sandboxed filesystem inspection; see Agentic tool calling. - Tool
calculate– safe arithmetic evaluator (Shunting-yard, noeval); see Agentic tool calling. - CLI built with Thor, entry point
exe/ollama_agent. self_review– self-review / improvement with a--mode:analysis(default, alias1) — read-only tools; report only; no writes.interactive(alias2,fix) — full tools on--root; you confirm each patch (likeask); optional-y/--semi.automated(alias3,sandbox) — temp copy, agent edits,bundle exec rspecin the sandbox, optional--applyto merge into your checkout.
improve— same asself_review --mode automated(you can pass--mode automatedexplicitly; other modes belong onself_review).orchestrate/OLLAMA_AGENT_ORCHESTRATOR=1— optional orchestrator tools to probe and delegate to other local CLI agents (see Orchestrator);agentslists availability.- Ruby API — embed
Runner,Agent, custom tools, hooks, sessions, and (optionally)ToolRuntime; see Library usage (Ruby).
Kernel runtime (deterministic execution)
Documentation (post-kernel): Capability matrix · CLI reference · Operations / incidents · Usage guide.
The runtime kernel is an optional execution layer behind OLLAMA_AGENT_KERNEL. It wraps file mutations in a saga-style finite state machine: intent reservation, atomic writes (CAS + pre-image hashes), ownership checks against compiled rules, SQLite-backed WAL and sagas, isolated post-mutation validation, and compensation on failure. The workspace root remains the trust boundary; the kernel adds structured ownership and fencing so replays and automation stay auditable. When cloud or validator paths fail, circuit-breaker style escalation limits (see the rollout runbook) keep bad states from compounding.
OLLAMA_AGENT_KERNEL |
Behavior |
|---|---|
unset / false (default) |
Legacy tool paths; kernel pipeline is not used for tool routing. |
shadow |
Same routing as true, but the pipeline runs in shadow mode: saga + WAL + observability run, while workspace bytes for certain mutations stay off the “real” path (see runbook). |
true / 1 |
Tool intents for configured mutation tools go through OllamaAgent::Runtime::KernelPipeline. |
Quick start (kernel on):
OLLAMA_AGENT_KERNEL=true bundle exec ollama_agent ask "Your task"
Design notes and roadmap items live in docs/new_features_plan_v2.md. Operational rollout, shadow mode, and rollback expectations are in docs/agile/release_rollout_runbook.md (incident SQL, health JSON, and compaction details are expanded in docs/OPERATIONS.md). For E7 validator activation (Docker-backed isolated checks), see docs/agile/docker_spec_activation.md.
Compaction and disk bounds: long-lived workspaces accumulate kernel SQLite rows and content-addressed blobs. Use OllamaAgent::Runtime::Compactor (logical current_epoch only — no wall clock) to prune sealed sagas, cold-archive old WAL rows into event_store_archive.db, purge expired recovery leases and stale intent reservations, and unlink blob files not referenced by compensations or in-flight mutation WAL payloads. OllamaAgent::Runtime::CompactorRunner wraps the compactor with an epoch interval for daemon loops (opt-in; nothing starts automatically).
Permission unification: when OLLAMA_AGENT_KERNEL is on and config/ollama_agent/owners.yml exists, OllamaAgent::Runtime::PermissionBridge reconciles legacy Runtime::Permissions / Runtime::Policies with Security::OwnershipIndex + CriticalityPolicy before pipeline execution. On divergence the bridge logs and prefers the kernel decision (stricter path wins). OllamaAgent::PermissionConflictError is raised only by the strict #allow_mutation? API for tests and diagnostics. With the kernel off, only the legacy permission path runs (no bridge).
Requirements
- Ruby ≥ 3.2 (enforced in the gemspec as
required_ruby_version) - Runtime kernel (SQLite): the sqlite3 gem is a runtime dependency; kernel storage uses
event_store.dbandruntime.dbunder.ollama_agent/kernel/in the configured project root. - Local: Ollama running and a capable tool-calling model, or
- Ollama Cloud: API key and a cloud-capable model name (see below)
Prerequisites (external tools)
patch— required foredit_file(GNUpatchonPATH). On Windows, use Git Bash, WSL, GnuWin32, or another environment that providespatch.rg(ripgrep) orgrep— text mode forsearch_codeneeds at least one of these onPATH(ripgrep is preferred when present).
Security and sandbox
- Project root — File tools and search are constrained to the configured workspace (
--root/OLLAMA_AGENT_ROOT). Treat that directory as the trust boundary: only aim the agent at trees you are willing to modify. list_directory_contents— Paths are resolved withFile.expand_pathrelative to the project root and rejected before the filesystem is touched if they escape that boundary.../../etc,/etc, and any other traversal are caught by a prefix check, not a regex.calculate— Uses a hand-written tokenizer and Shunting-yard evaluator.evalis never called. Only numeric literals and the operators+,-,*,/,**are accepted; any other character is an error.run_shell(optional tool) — Commands are parsed into an argument vector (no shell) and must match an allowlist; a denylist blocks obviously dangerous patterns. You can still shoot yourself in the foot with an allowed prefix (for examplegitwith destructive subcommands), so keep profiles and permissions tight in automated setups.- Timeouts — Text search honors
OLLAMA_AGENT_SEARCH_TIMEOUT_SEC(default 120). Shell execution has its own per-invocation timeout. - Logging — Budget, loop-detection, and
list_local_model_namesfailures go through Ruby’sLogger(stderr by default). SetOLLAMA_AGENT_LOG_LEVEL=debugorOLLAMA_AGENT_DEBUG=1for more detail.
Installation
From RubyGems (when published) or from this repository:
bundle install
Usage
Default: run the gem with no subcommand to open the interactive TUI (same as ask with no query):
ollama_agent
# or from this repo:
bundle exec ruby exe/ollama_agent
Other entry points are opt-in: pass a subcommand (self_review, sessions, …) or ask / orchestrate with a query for a one-shot task, or flags for a plain line REPL (see below).
From the project you want the agent to modify (set the working directory accordingly):
bundle exec ruby exe/ollama_agent ask "Update the README.md with current codebase"
From this repository after bundle install, ruby exe/ollama_agent (without bundle exec) also works: the executable adds lib to the load path and loads bundler/setup when a Gemfile is present.
Apply proposed patches without interactive confirmation:
bundle exec ruby exe/ollama_agent ask -y "Your task"
# Review / audit only (no patches, writes, or delegation)—same as a report-style self_review
bundle exec ruby exe/ollama_agent ask --read-only "Summarize risks in this repo"
Long-running models (slow local inference):
bundle exec ruby exe/ollama_agent ask --timeout 300 "Your task"
Agent budget (steps, tokens, cost)
Each model round-trip that runs during a session counts as one step toward OLLAMA_AGENT_MAX_TURNS (default 64), enforced together with token and optional cost limits in OllamaAgent::Core::Budget. Exploratory tasks that list, read, and search across a large repository can burn through steps quickly; if you see budget exceeded — step limit (64), raise the limit—for example:
export OLLAMA_AGENT_MAX_TURNS=128
bundle exec ruby exe/ollama_agent ask "Your wide-ranging task"
Narrower prompts, --read-only, or a smaller --root also reduce step usage. With OLLAMA_AGENT_DEBUG=1, the agent prints an extra hint when the maximum tool rounds for a run are reached.
search_code and regex patterns
In text mode, the tool passes your pattern to ripgrep (or grep). Patterns are regular expressions: literal parentheses, brackets, and unbalanced groups can trigger errors (for example unclosed group). Escape metacharacters or use fixed-string mode when your tool schema exposes it.
Plain line REPL (no TUI boxes / markdown shell): use ask (or orchestrate) with -i and without --tui—for example when you omit the query you must opt out of the default TUI this way:
bundle exec ruby exe/ollama_agent ask --interactive
# same idea: explicit -i, no --tui
Self-review modes (default project root is the current working directory unless you set --root or OLLAMA_AGENT_ROOT):
# Mode 1 — analysis only (default)
bundle exec ruby exe/ollama_agent self_review
bundle exec ruby exe/ollama_agent self_review --mode analysis
# Mode 2 — optional fixes in the working tree (confirm each patch, or -y / --semi)
bundle exec ruby exe/ollama_agent self_review --mode interactive
# Mode 3 — sandbox + tests + optional merge back (same as `improve`)
# Without --apply, edits stay in a temp dir only; pass --apply to copy changed files into your checkout.
bundle exec ruby exe/ollama_agent self_review --mode automated
bundle exec ruby exe/ollama_agent self_review --mode automated --apply
bundle exec ruby exe/ollama_agent improve --apply
ruby_mastery (optional): When the ruby_mastery gem is installed (this repo lists it in the Gemfile for development), self_review (all modes) and improve prepend a markdown static-analysis section to the user prompt. Add the same gem to your app’s Gemfile if you want that behavior outside this checkout. Disable with --no-ruby-mastery or OLLAMA_AGENT_RUBY_MASTERY=0. Limit size with OLLAMA_AGENT_RUBY_MASTERY_MAX_CHARS (default 60000).
For mode 3, -y skips all patch prompts; --no-semi prompts for every patch when not using -y.
Reasoning / thinking output
On thinking-capable models, Ollama can return reasoning separately from the final answer (message.thinking vs message.content). The CLI labels them Thinking (dim) and Assistant (green / Markdown).
Enable think on the request
The agent sends Ollama’s think field only when you set it (CLI or env). If you omit it, the server uses its own defaults—and some models then omit or change reasoning in the response.
| You want | CLI | Environment |
|---|---|---|
| Reasoning on (typical Qwen / DeepSeek-style) | --think true |
OLLAMA_AGENT_THINK=true or 1 |
| Reasoning off | --think false |
OLLAMA_AGENT_THINK=false or 0 |
| GPT-OSS style levels | --think low, medium, or high |
OLLAMA_AGENT_THINK=medium (example) |
Examples:
OLLAMA_AGENT_THINK=true bundle exec ruby exe/ollama_agent ask -i
bundle exec ruby exe/ollama_agent ask -i --think true
# GPT-OSS: prefer a level, not only true/false
bundle exec ruby exe/ollama_agent ask --think medium "Your task"
Streaming vs one-shot (default)
| Mode | Flags | What you see |
|---|---|---|
| One-shot (default) | neither --stream nor OLLAMA_AGENT_STREAM=1 |
Each model round completes over HTTP; Thinking / Assistant are printed from the assembled message (including Gemma-style reasoning tags stripped from content when the API omits thinking). |
| Streaming | --stream or OLLAMA_AGENT_STREAM=1 |
Reasoning streams in dim text under one Thinking line, then Assistant and the reply stream—similar to Cursor. Uses hooks[:on_thinking] on the ollama-client chat stream (see OllamaAgent::OllamaChatThinkingStreamPatch). |
OLLAMA_AGENT_THINK=medium OLLAMA_AGENT_STREAM=1 bundle exec ruby exe/ollama_agent ask "Your task"
Note: Subscribing only to on_thinking does not enable the streaming chat path; the agent uses streaming when something listens for on_token (the console streamer registers both). See CHANGELOG 1.0.0 if you embed the library.
Display style (TTY)
By default OLLAMA_AGENT_THINKING_STYLE=compact: one Thinking header per ask run; later reasoning chunks in the same run are separated by blank lines only (including after tool rounds). OLLAMA_AGENT_THINKING_STYLE=framed repeats the full boxed banner per message. Thinking body is plain dim unless OLLAMA_AGENT_THINKING_MARKDOWN=1.
The CLI uses ANSI colors on a TTY (banner, prompt, patch prompts). Assistant replies use Markdown via tty-markdown when stdout is a TTY and NO_COLOR is unset. Disable Markdown with OLLAMA_AGENT_MARKDOWN=0; disable colors with NO_COLOR or OLLAMA_AGENT_COLOR=0.
If you see no Thinking block
- Set
thinkexplicitly—especially for GPT-OSS (low/medium/high). - Confirm the model returns
message.thinking(e.g.curl/ollamaCLI against/api/chatwith the samethinkvalue). If the API never sendsthinking, the agent has nothing to show. - Try streaming (
--streamorOLLAMA_AGENT_STREAM=1) if you want live reasoning tokens. - Embedded reasoning in
content: Some templates (e.g. Gemma) put tags such as<|channel>thought…<channel|>or<redacted_thinking>…</redacted_thinking>insidecontent. The agent strips those into Thinking when present (OllamaAgent::GemmaThoughtContentParser). If your model uses different delimiters, reasoning may stay inside the main reply until parsers are extended.
Ruby API
OllamaAgent::Runner.build(stream: true, think: "medium").run("Your task")
Custom subscribers can attach to hooks[:on_thinking] and hooks[:on_token] on the same Runner instance (see OllamaAgent::Streaming::Hooks).
Ollama Cloud
Ollama Cloud uses the same HTTP API as the local server, with HTTPS and a Bearer API key. The ollama-client gem sends Authorization: Bearer <api_key> when Ollama::Config#api_key is set (HTTPS is used when the URL scheme is https).
- Create a key at ollama.com/settings/keys.
- Point the agent at the cloud host and pass the key (same env names as ollama-client’s docs):
export OLLAMA_BASE_URL="https://ollama.com"
export OLLAMA_API_KEY="your_key"
export OLLAMA_AGENT_MODEL="gpt-oss:120b-cloud" # example; pick a cloud model from `ollama list` / the catalog
# Reasoning for GPT-OSS: set a level (see "Reasoning / thinking output" above)
export OLLAMA_AGENT_THINK=medium
bundle exec ruby exe/ollama_agent ask "Your task"
Multi-Key Provider Credential Orchestration & Failover
To handle rate limits (RPM/TPM window exhaustion), daily quotas, or network timeouts when using Ollama Cloud (or other providers like OpenAI and Anthropic), you can configure a thread-safe, quota-aware Credential Pool with automatic reactive failover.
The pool can be configured dynamically via the Ruby API or auto-detected from environment variables.
Environment Auto-Detection
If no explicit credentials are passed in, ollama_agent automatically scans the environment for keys indexed 1 to 5 and initializes a CredentialPool:
- Ollama Cloud: Configure keys
OLLAMA_API_KEY_1throughOLLAMA_API_KEY_5. When any of these are present, requests are routed to"https://api.ollama.com"(aliased as"ollama_cloud"). - OpenAI: Configure keys
OPENAI_API_KEY_1throughOPENAI_API_KEY_5. - Anthropic: Configure keys
ANTHROPIC_API_KEY_1throughANTHROPIC_API_KEY_5.
Example for Ollama Cloud:
export OLLAMA_API_KEY_1="ollama_key_abc123"
export OLLAMA_API_KEY_2="ollama_key_def456"
bundle exec ruby exe/ollama_agent ask "Your task"
Ruby API Configuration
You can also pass a structured array of credential hashes to OllamaAgent::Runner.build:
runner = OllamaAgent::Runner.build(
credentials: [
{
id: "ollama-cloud-primary",
provider: "ollama_cloud",
api_key: "ollama_...",
weight: 2, # weighted round-robin priority (default: 1)
limits: { rpm: 10, tpm: 10000, daily_tokens: 1000000 } # automatic quota tracking
},
{
id: "openai-backup",
provider: "openai",
api_key: "sk-...",
weight: 1
}
]
)
runner.run("Your task")
Failover Behavior
- Weighted Round-Robin: Requests are balanced across healthy and available credentials.
- Quota Tracking: Daily token/request and RPM/TPM sliding windows are tracked locally.
- Reactive Failover: If a key encounters a rate limit (
HTTP 429), temporary provider error (HTTP 5xx), or quota exhaustion, it is temporarily cooled down and the request is retried with the next available key. - Permanent Disabling: If a key encounters an authentication failure (
HTTP 401orHTTP 403), it is permanently disabled to prevent dead-key hammering.
Environment
| Variable | Purpose |
|---|---|
OLLAMA_BASE_URL |
Ollama API base URL (default from ollama-client: http://localhost:11434; use https://ollama.com for cloud) |
OLLAMA_API_KEY |
API key for Ollama Cloud (https://ollama.com); optional for local HTTP |
OLLAMA_AGENT_MODEL |
Model name (overrides default from ollama-client) |
OLLAMA_AGENT_ROOT |
Project root for tools (list_files, read_file, etc.). Defaults to current working directory when unset (CLI never falls back to the gem install path). |
OLLAMA_AGENT_DEBUG |
Set to 1 to print validation diagnostics on stderr |
OLLAMA_AGENT_STRICT_ENV |
Set to 1 so invalid numeric env values (e.g. OLLAMA_AGENT_MAX_TURNS) raise ConfigurationError instead of falling back to defaults |
OLLAMA_AGENT_MAX_TURNS |
Max chat rounds with tool calls (default: 64) |
OLLAMA_AGENT_TIMEOUT |
HTTP read/open timeout in seconds for Ollama requests (default 120; use ask --timeout / -t to override per run) |
OLLAMA_AGENT_PARSE_TOOL_JSON |
Set to 1 to run tools parsed from JSON lines in assistant text (fallback when the model does not emit native tool calls) |
NO_COLOR |
Set (any value) to disable ANSI colors (see no-color.org) |
OLLAMA_AGENT_COLOR |
Set to 0 to disable colors even on a TTY |
OLLAMA_AGENT_MARKDOWN |
Set to 0 to disable Markdown formatting of assistant replies (plain text only) |
OLLAMA_AGENT_THINKING_STYLE |
compact (default) = one Thinking label per run, blank lines between later reasoning chunks; framed = repeat full banner/rulers each message |
OLLAMA_AGENT_THINKING_MARKDOWN |
Set to 1 to render thinking text with Markdown (muted); default is plain dim text |
OLLAMA_AGENT_STREAM |
Set to 1 to stream tokens and reasoning to stdout (same as CLI --stream on ask / self_review / improve). |
OLLAMA_AGENT_THINK |
Model thinking mode for compatible models: true / false, or high / medium / low (see ollama-client think:). Empty = omit (server default). GPT-OSS: use low / medium / high. |
OLLAMA_AGENT_PATCH_RISK_MAX_DIFF_LINES |
Max changed-line count before a diff is treated as "large" for semi-auto patch risk (default 80) |
OLLAMA_AGENT_INDEX_REBUILD |
Set to 1 to drop the cached Prism Ruby index before the next symbol search in this process |
OLLAMA_AGENT_RUBY_INDEX_MAX_FILES |
Max .rb files to parse per index build (default 5000) |
OLLAMA_AGENT_RUBY_INDEX_MAX_FILE_BYTES |
Skip Ruby files larger than this many bytes (default 512000) |
OLLAMA_AGENT_RUBY_INDEX_MAX_LINES |
Max result lines for search_code class/module/method modes (default 200) |
OLLAMA_AGENT_RUBY_INDEX_MAX_CHARS |
Max characters of index output per search (default 60000) |
OLLAMA_AGENT_MAX_READ_FILE_BYTES |
Max bytes for a full read_file (no line range); larger files return an error (default 2097152, 2 MiB). Line-range reads stream and are not limited by this cap. |
OLLAMA_AGENT_RG_PATH |
Absolute path to rg for search_code text mode (optional; otherwise first rg on PATH) |
OLLAMA_AGENT_GREP_PATH |
Absolute path to grep fallback (optional; otherwise first grep on PATH) |
OLLAMA_AGENT_INDEX_REBUILD |
The Prism index is rebuilt when this env value changes (e.g. unset → 1); it is not rebuilt on every tool call while it stays 1. |
OLLAMA_AGENT_SKILLS |
1/on/0/off — include bundled prompt skills (default on). Same as --no-skills on the CLI when off. |
OLLAMA_AGENT_SKILLS_INCLUDE |
Comma-separated manifest ids to load (omit = all bundled). Example: ruby_style,rubocop,code_review. |
OLLAMA_AGENT_SKILLS_EXCLUDE |
Comma-separated ids to skip from the bundled set. |
OLLAMA_AGENT_SKILL_PATHS |
Extra .md files or directories, colon-separated (Unix PATH style). Directory entries load all *.md in sorted order. Merged with --skill-paths. |
OLLAMA_AGENT_EXTERNAL_SKILLS |
1/0 — include content from OLLAMA_AGENT_SKILL_PATHS (default on). Set 0 to use bundled-only without unsetting paths. |
Prompt skills (bundled + optional paths)
The system prompt is the base agent instructions (AgentPrompt) plus optional Markdown sections. Bundled files live under lib/ollama_agent/prompt_skills/ and are listed in manifest.yml. Each file may use Cursor-style YAML frontmatter (--- … ---); the loader strips frontmatter before sending text to the model.
Manifest ids (in load order): clean_ruby, ruby_style, rubocop, solid, solid_ruby, design_patterns, rspec, rails_style, rails_best_practices, code_review, ollama_agent_patterns.
Bundled bodies were copied from Cursor SKILL.md files under ~/.cursor/skills/ (and ollama_agent_patterns from this repo’s .cursor/skills/ollama-agent-patterns). Re-copy when you update those skills upstream.
Many full skills can be large; use OLLAMA_AGENT_SKILLS_INCLUDE to trim for small-context models.
CLI flags (also available on ask, self_review, improve): --no-skills, --skill-paths 'path1:path2/dir'.
To run self_review / ask against the installed gem’s source (e.g. to hack on ollama_agent itself), pass an explicit root, for example --root "$(bundle show ollama_agent)" or a path to a git clone.
Orchestrator (external CLI agents)
Use the orchestrate command (or OLLAMA_AGENT_ORCHESTRATOR=1 with ask) to expose tools list_external_agents and delegate_to_agent. The Ollama model should gather context with read_file / search_code, list installed CLIs, then delegate a short task + context to an external agent (Claude Code, Gemini CLI, Codex, Cursor CLI, etc.). Definitions live in lib/ollama_agent/external_agents/default_agents.yml; override or extend via ~/.config/ollama_agent/agents.yml or OLLAMA_AGENT_EXTERNAL_AGENTS_CONFIG.
ollama_agent agents— print a table of configured agents and whether each binary is onPATH.ollama_agent doctor— alias foragents.delegate_to_agentruns a fixed argv (no shell) withcwd= project root; output is capped (OLLAMA_AGENT_DELEGATE_MAX_OUTPUT_BYTES, default 100k). Confirm each run unless-y.- Delegation audit logs: set
OLLAMA_AGENT_DELEGATE_LOG=1(orOLLAMA_AGENT_DEBUG=1) to emit a structured stderr line with agent id, argv, env keys (names only), exit code, and duration. - Adjust
argv/version_argvin YAML to match your real CLI (vendor flags differ). If a tool has no stable non-interactive mode, do not expose it in the registry. - Tool contract version:
OllamaAgent::ORCHESTRATOR_TOOLS_SCHEMA_VERSION.
Agentic tool calling (local environment tools)
Two built-in tools let the model observe its environment and compute precisely — the pattern described in Easy Agentic Tool Calling with Gemma 4 — adapted here for the Ruby runtime.
list_directory_contents
Lists files and subdirectories inside the current workspace. The model decides when to inspect the environment rather than guessing at what exists.
bundle exec ruby exe/ollama_agent ask \
"What scripts are in the current folder, and which one looks like it handles CSV processing?"
All paths are sandboxed to OLLAMA_AGENT_ROOT. Traversal attempts (../../etc, absolute paths) are rejected before the filesystem is touched.
calculate
Evaluates an arithmetic expression using a Shunting-yard parser. The model offloads precision arithmetic instead of computing in its weights.
bundle exec ruby exe/ollama_agent ask \
"What is the standard deviation of 12, 18, 23, 29, 31, 35, 44, 47 — compute it step by step using the formula."
Supports +, -, *, /, ** (right-associative), parentheses, and unary +/-. No eval.
Combining both tools
The model can chain the two tools in a single request — inspect the workspace first, then compute something about what it found:
bundle exec ruby exe/ollama_agent ask \
"Look at the files in the current folder and tell me the total size in kilobytes, rounded to two decimal places."
Internally: the model calls list_directory_contents to get byte sizes, then calls calculate with the sum and division by 1024.
Ruby API
require "ollama_agent"
runner = OllamaAgent::Runner.build(root: Dir.pwd)
# Filesystem inspection
puts runner.run("What files are in lib/ollama_agent/tools/?")
# Arithmetic
puts runner.run("What is (412 + 1834 + 10786 + 88 + 2210) / 1024, rounded to 2 decimal places?")
See examples/agentic_tool_calling.rb for a runnable end-to-end demo.
Library usage (Ruby)
Most of this README is CLI-first (commands and environment variables above). The same capabilities exist as Ruby APIs—the Features list (file tools, self_review / improve, orchestrator, skills, etc.) is implemented under lib/ollama_agent/. For a layer diagram (agent → tools → hooks → session), see docs/ARCHITECTURE.md.
Coding agent — Runner (facade) — Stable entry for apps: OllamaAgent::Runner.build(root:, model:, stream:, session_id:, resume:, read_only:, orchestrator:, skills_enabled:, skill_paths:, audit:, max_tokens:, context_summarize:, stdin:, stdout:, ...) then #run(query). Optional stdin / stdout (default TTY) feed patch/write/delegate confirmations—use StringIO in tests or automation to avoid blocking on $stdin.gets. Exposes #hooks (Streaming::Hooks) for :on_token, :on_thinking (streamed reasoning when stream: true and the model supports it), :on_tool_call, :on_tool_result, :on_complete. Full keyword list: lib/ollama_agent/runner.rb.
Coding agent — Agent (direct) — OllamaAgent::Agent.new(client:, root:, ...) when you inject an Ollama::Client (or test double), tweak options the CLI does not expose, or skip Runner.
Custom tools (coding agent) — OllamaAgent::Tools.register("tool_name", schema: { ... }) { |args, root:, read_only:| ... } merges extra function definitions into the chat tool list; handlers run in the same sandbox as built-in tools.
Resilience and observability — Default client path uses Resilience::RetryMiddleware. Structured step logging: enable audit: true on Runner.build or OLLAMA_AGENT_AUDIT=1 (see Environment table). Context trimming: max_tokens / context_summarize on Runner.build.
Sessions — Pass session_id and optional resume: true on Runner.build to persist messages under .ollama_agent/sessions/ (Session::Store).
Self-improvement (sandbox) — CLI commands improve / self_review --mode automated wrap OllamaAgent::SelfImprovement (sandbox copy, tests, optional merge). Use the CLI for the full flow; the module is available for advanced integration.
ToolRuntime (alternate loop, optional) — Not used by the CLI. For non–file-edit agents (e.g. another gem that defines its own tools), a small JSON plan loop: the model returns one object per step {"tool":"name","args":{...}}, ToolRuntime::Registry resolves it, Executor runs your Tool subclasses, Memory holds short-term history. Use a swappable planner (anything implementing next_step(context:, memory:, registry:)) such as OllamaJsonPlanner (Ollama::Client#chat + JSON extraction). Step-by-step guide: docs/TOOL_RUNTIME.md.
- Termination: a tool may return
{ "status" => "done" }to stop. Unknown tool names →OllamaAgent::ToolRuntime::InvalidPlanError; too many steps →MaxStepsExceeded.Loop#runreturns the last tool result (same value as the finalExecutor#executereturn). - Runnable examples:
spec/ollama_agent/tool_runtime/.
Model and server: OllamaJsonPlanner uses the same default as the coding agent: OLLAMA_AGENT_MODEL if set, otherwise Ollama::Config.new.model (from ollama-client). The model must exist on whatever host you use. Use the same client setup as the CLI: OllamaAgent::OllamaConnection.apply_env_to_config copies OLLAMA_BASE_URL and OLLAMA_API_KEY into Ollama::Config. If you only run Ollama::Client.new(config: Ollama::Config.new) in irb, you stay on localhost while OLLAMA_AGENT_MODEL may still name a cloud model from the README cloud example → 404. Either apply apply_env_to_config (below) or unset the cloud model / pass model: "llama3.2".
require "ollama_agent"
require "ollama_client"
class EchoTool < OllamaAgent::ToolRuntime::Tool
def name = "echo"
def description = "Echo args"
def schema = { "type" => "object", "properties" => { "msg" => { "type" => "string" } } }
def call(args)
return { "status" => "done", "echo" => args["msg"] } if args["msg"] == "bye"
{ "status" => "ok", "echo" => args["msg"] }
end
end
registry = OllamaAgent::ToolRuntime::Registry.new([EchoTool.new])
memory = OllamaAgent::ToolRuntime::Memory.new
config = Ollama::Config.new
OllamaAgent::OllamaConnection.apply_env_to_config(config)
client = Ollama::Client.new(config: config)
planner = OllamaAgent::ToolRuntime::OllamaJsonPlanner.new(client: client)
last = OllamaAgent::ToolRuntime::Loop.new(
planner: planner,
registry: registry,
executor: OllamaAgent::ToolRuntime::Executor.new,
memory: memory,
max_steps: 10
).run(context: "Say hello then echo bye to finish.")
# last => e.g. { "status" => "done", "echo" => "bye" }
Skills (deterministic JSON-contract pipelines)
Skills are single-purpose generators that bypass the tool-calling agent loop and return strict JSON validated against a schema. They are meant for pipelines that need predictable, parseable output — code review, refactoring suggestions, performance audits, debugging triage — without the unpredictability of free-form LLM prose.
Built-in skills:
architecture_refactor— restructure code without changing behaviorperformance_optimizer— identify bottlenecks and emit optimized codedebug_engineer— root-cause a bug and propose a fixfeature_builder— design and implement a production-ready feature
Each skill:
- Renders a deterministic prompt (LLM
temperature: 0by default). - Extracts the first balanced JSON object from the response (tolerates prose
and
```jsonfences). - Validates against the skill's
SCHEMAand raisesContractErroron mismatch.
CLI
# list registered skills
ollama_agent skill list
# run a single skill
ollama_agent skill run architecture_refactor --code-file lib/orders/manager.rb
# compose a pipeline; later skills receive earlier outputs merged in
ollama_agent skill pipeline architecture_refactor performance_optimizer \
--code-file lib/exit_management.rb
Override the model with --model, OLLAMA_AGENT_SKILL_MODEL, or
OLLAMA_AGENT_MODEL.
Ruby
result = OllamaAgent::Skills::ArchitectureRefactorer.new.call(
code: File.read("lib/orders/manager.rb")
)
# => { folder_structure: [...], architecture_notes: "...", refactored_code: "..." }
OllamaAgent::Skills::Runner.new(
[:architecture_refactor, :performance_optimizer]
).call(code: File.read("lib/exit_management.rb"))
Inject your own LLM client (anything responding to #generate(prompt) → String)
in tests:
class FakeLlm
def generate(_prompt)
'{"bottlenecks": [], "optimizations": [], "optimized_code": "x"}'
end
end
OllamaAgent::Skills::PerformanceOptimizer.new(llm: FakeLlm.new).call(code: "...")
By default skills run against the local Ollama provider (local-first, auditable).
They go through OllamaAgent::Providers::Registry, so any registered provider
(OpenAI, Anthropic, custom) is usable by passing your own LlmClient.
Troubleshooting
- Use a tool-capable model — Set
OLLAMA_AGENT_MODELto a model that supports function/tool calling (e.g. a recent coder-tuned variant). If the model only prints{"name": "read_file", ...}in plain text, tools never run unless you enableOLLAMA_AGENT_PARSE_TOOL_JSON=1. - Malformed diffs — Headers must look like
git diff:--- a/filethen+++ b/filethen a unified hunk line starting with@@(not legacy--- N,M ----). Do not put commas after path tokens. The gem normalizes some mistakes and runspatch --dry-runbefore applying. - Request timeouts — The agent defaults to a 120s HTTP timeout (longer than ollama-client’s 30s). If you still hit
Ollama::TimeoutError, raise it withOLLAMA_AGENT_TIMEOUT=300,bundle exec ruby exe/ollama_agent ask --timeout 300 "...", or-t 300. Ensure the variable name is exactlyOLLAMA_AGENT_TIMEOUT(a leading typo such asvOLLAMA_AGENT_TIMEOUTis ignored).
How it works
- The CLI starts
OllamaAgent::Agent, which loops onOllama::Client#chatwith tool definitions. - Tools are executed in-process under a path sandbox (
OLLAMA_AGENT_ROOT). search_codedefaults to ripgrep/grep (modeomitted ortext). For Ruby, usemodemethod,class,module, orconstantto query a Prism parse index (built lazily on first use).read_fileaccepts optionalstart_line/end_line(1-based, inclusive) to read only part of a file.- Patches are validated and checked with
patch --dry-runbefore you confirm (unless-y).
Development
bundle exec rspec
bundle exec rubocop
Ongoing refactors (contributors): the Agent class is a thin façade over TurnLoop, ChatCoordinator, session/client wiring, and Tools::BuiltInSchemas so new behavior should land in those collaborators instead of growing monolithic methods. See CONTRIBUTING.md.
CI and RubyGems release
- CI —
.github/workflows/main.ymlruns RSpec and RuboCop on pushes tomain/masterand on pull requests (Ruby 3.3.4 and 3.2.0). - Release —
.github/workflows/release.ymlruns on tagsv*. It checks that the tag matchesOllamaAgent::VERSIONinlib/ollama_agent/version.rb, builds withgem build ollama_agent.gemspec, and pushes to RubyGems.
Repository secrets (Settings → Secrets and variables → Actions):
| Secret | Purpose |
|---|---|
RUBYGEMS_API_KEY |
RubyGems API key with push scope |
RUBYGEMS_OTP_SECRET |
Base32 secret for TOTP (RubyGems MFA); the workflow uses rotp to generate a one-time code for gem push |
Release steps:
- Bump
OllamaAgent::VERSIONinlib/ollama_agent/version.rband commit tomain. - Tag:
git tag v1.0.0(must match the version string) andgit push origin v1.0.0.
License
MIT. See LICENSE.txt.