ollama_agent

Version: 1.0.0

Ruby gem that runs a CLI coding agent against a local Ollama model. It exposes tools to list files, read files, search the tree (ripgrep or grep), and apply unified diffs so the model can make small, reviewable edits.

Contents

Features

  • Tool list_files – list project files.
  • Tool read_file – read file contents.
  • Tool search_code – search code with ripgrep or grep.
  • Tool edit_file – apply unified diffs safely.
  • Tool list_directory_contents – sandboxed filesystem inspection; see Agentic tool calling.
  • Tool calculate – safe arithmetic evaluator (Shunting-yard, no eval); see Agentic tool calling.
  • CLI built with Thor, entry point exe/ollama_agent.
  • self_review – self-review / improvement with a --mode:
    • analysis (default, alias 1) — read-only tools; report only; no writes.
    • interactive (alias 2, fix) — full tools on --root; you confirm each patch (like ask); optional -y / --semi.
    • automated (alias 3, sandbox) — temp copy, agent edits, bundle exec rspec in the sandbox, optional --apply to merge into your checkout.
  • improve — same as self_review --mode automated (you can pass --mode automated explicitly; other modes belong on self_review).
  • orchestrate / OLLAMA_AGENT_ORCHESTRATOR=1 — optional orchestrator tools to probe and delegate to other local CLI agents (see Orchestrator); agents lists availability.
  • Ruby API — embed Runner, Agent, custom tools, hooks, sessions, and (optionally) ToolRuntime; see Library usage (Ruby).

Kernel runtime (deterministic execution)

Documentation (post-kernel): Capability matrix · CLI reference · Operations / incidents · Usage guide.

The runtime kernel is an optional execution layer behind OLLAMA_AGENT_KERNEL. It wraps file mutations in a saga-style finite state machine: intent reservation, atomic writes (CAS + pre-image hashes), ownership checks against compiled rules, SQLite-backed WAL and sagas, isolated post-mutation validation, and compensation on failure. The workspace root remains the trust boundary; the kernel adds structured ownership and fencing so replays and automation stay auditable. When cloud or validator paths fail, circuit-breaker style escalation limits (see the rollout runbook) keep bad states from compounding.

OLLAMA_AGENT_KERNEL Behavior
unset / false (default) Legacy tool paths; kernel pipeline is not used for tool routing.
shadow Same routing as true, but the pipeline runs in shadow mode: saga + WAL + observability run, while workspace bytes for certain mutations stay off the “real” path (see runbook).
true / 1 Tool intents for configured mutation tools go through OllamaAgent::Runtime::KernelPipeline.

Quick start (kernel on):

OLLAMA_AGENT_KERNEL=true bundle exec ollama_agent ask "Your task"

Design notes and roadmap items live in docs/new_features_plan_v2.md. Operational rollout, shadow mode, and rollback expectations are in docs/agile/release_rollout_runbook.md (incident SQL, health JSON, and compaction details are expanded in docs/OPERATIONS.md). For E7 validator activation (Docker-backed isolated checks), see docs/agile/docker_spec_activation.md.

Compaction and disk bounds: long-lived workspaces accumulate kernel SQLite rows and content-addressed blobs. Use OllamaAgent::Runtime::Compactor (logical current_epoch only — no wall clock) to prune sealed sagas, cold-archive old WAL rows into event_store_archive.db, purge expired recovery leases and stale intent reservations, and unlink blob files not referenced by compensations or in-flight mutation WAL payloads. OllamaAgent::Runtime::CompactorRunner wraps the compactor with an epoch interval for daemon loops (opt-in; nothing starts automatically).

Permission unification: when OLLAMA_AGENT_KERNEL is on and config/ollama_agent/owners.yml exists, OllamaAgent::Runtime::PermissionBridge reconciles legacy Runtime::Permissions / Runtime::Policies with Security::OwnershipIndex + CriticalityPolicy before pipeline execution. On divergence the bridge logs and prefers the kernel decision (stricter path wins). OllamaAgent::PermissionConflictError is raised only by the strict #allow_mutation? API for tests and diagnostics. With the kernel off, only the legacy permission path runs (no bridge).

Requirements

  • Ruby ≥ 3.2 (enforced in the gemspec as required_ruby_version)
  • Runtime kernel (SQLite): the sqlite3 gem is a runtime dependency; kernel storage uses event_store.db and runtime.db under .ollama_agent/kernel/ in the configured project root.
  • Local: Ollama running and a capable tool-calling model, or
  • Ollama Cloud: API key and a cloud-capable model name (see below)

Prerequisites (external tools)

  • patch — required for edit_file (GNU patch on PATH). On Windows, use Git Bash, WSL, GnuWin32, or another environment that provides patch.
  • rg (ripgrep) or grep — text mode for search_code needs at least one of these on PATH (ripgrep is preferred when present).

Security and sandbox

  • Project root — File tools and search are constrained to the configured workspace (--root / OLLAMA_AGENT_ROOT). Treat that directory as the trust boundary: only aim the agent at trees you are willing to modify.
  • list_directory_contents — Paths are resolved with File.expand_path relative to the project root and rejected before the filesystem is touched if they escape that boundary. ../../etc, /etc, and any other traversal are caught by a prefix check, not a regex.
  • calculate — Uses a hand-written tokenizer and Shunting-yard evaluator. eval is never called. Only numeric literals and the operators +, -, *, /, ** are accepted; any other character is an error.
  • run_shell (optional tool) — Commands are parsed into an argument vector (no shell) and must match an allowlist; a denylist blocks obviously dangerous patterns. You can still shoot yourself in the foot with an allowed prefix (for example git with destructive subcommands), so keep profiles and permissions tight in automated setups.
  • Timeouts — Text search honors OLLAMA_AGENT_SEARCH_TIMEOUT_SEC (default 120). Shell execution has its own per-invocation timeout.
  • Logging — Budget, loop-detection, and list_local_model_names failures go through Ruby’s Logger (stderr by default). Set OLLAMA_AGENT_LOG_LEVEL=debug or OLLAMA_AGENT_DEBUG=1 for more detail.

Installation

From RubyGems (when published) or from this repository:

bundle install

Usage

Default: run the gem with no subcommand to open the interactive TUI (same as ask with no query):

ollama_agent
# or from this repo:
bundle exec ruby exe/ollama_agent

Other entry points are opt-in: pass a subcommand (self_review, sessions, …) or ask / orchestrate with a query for a one-shot task, or flags for a plain line REPL (see below).

From the project you want the agent to modify (set the working directory accordingly):

bundle exec ruby exe/ollama_agent ask "Update the README.md with current codebase"

From this repository after bundle install, ruby exe/ollama_agent (without bundle exec) also works: the executable adds lib to the load path and loads bundler/setup when a Gemfile is present.

Apply proposed patches without interactive confirmation:

bundle exec ruby exe/ollama_agent ask -y "Your task"

# Review / audit only (no patches, writes, or delegation)—same as a report-style self_review
bundle exec ruby exe/ollama_agent ask --read-only "Summarize risks in this repo"

Long-running models (slow local inference):

bundle exec ruby exe/ollama_agent ask --timeout 300 "Your task"

Agent budget (steps, tokens, cost)

Each model round-trip that runs during a session counts as one step toward OLLAMA_AGENT_MAX_TURNS (default 64), enforced together with token and optional cost limits in OllamaAgent::Core::Budget. Exploratory tasks that list, read, and search across a large repository can burn through steps quickly; if you see budget exceeded — step limit (64), raise the limit—for example:

export OLLAMA_AGENT_MAX_TURNS=128
bundle exec ruby exe/ollama_agent ask "Your wide-ranging task"

Narrower prompts, --read-only, or a smaller --root also reduce step usage. With OLLAMA_AGENT_DEBUG=1, the agent prints an extra hint when the maximum tool rounds for a run are reached.

search_code and regex patterns

In text mode, the tool passes your pattern to ripgrep (or grep). Patterns are regular expressions: literal parentheses, brackets, and unbalanced groups can trigger errors (for example unclosed group). Escape metacharacters or use fixed-string mode when your tool schema exposes it.

Plain line REPL (no TUI boxes / markdown shell): use ask (or orchestrate) with -i and without --tui—for example when you omit the query you must opt out of the default TUI this way:

bundle exec ruby exe/ollama_agent ask --interactive
# same idea: explicit -i, no --tui

Self-review modes (default project root is the current working directory unless you set --root or OLLAMA_AGENT_ROOT):

# Mode 1 — analysis only (default)
bundle exec ruby exe/ollama_agent self_review
bundle exec ruby exe/ollama_agent self_review --mode analysis

# Mode 2 — optional fixes in the working tree (confirm each patch, or -y / --semi)
bundle exec ruby exe/ollama_agent self_review --mode interactive

# Mode 3 — sandbox + tests + optional merge back (same as `improve`)
# Without --apply, edits stay in a temp dir only; pass --apply to copy changed files into your checkout.
bundle exec ruby exe/ollama_agent self_review --mode automated
bundle exec ruby exe/ollama_agent self_review --mode automated --apply
bundle exec ruby exe/ollama_agent improve --apply

ruby_mastery (optional): When the ruby_mastery gem is installed (this repo lists it in the Gemfile for development), self_review (all modes) and improve prepend a markdown static-analysis section to the user prompt. Add the same gem to your app’s Gemfile if you want that behavior outside this checkout. Disable with --no-ruby-mastery or OLLAMA_AGENT_RUBY_MASTERY=0. Limit size with OLLAMA_AGENT_RUBY_MASTERY_MAX_CHARS (default 60000).

For mode 3, -y skips all patch prompts; --no-semi prompts for every patch when not using -y.

Reasoning / thinking output

On thinking-capable models, Ollama can return reasoning separately from the final answer (message.thinking vs message.content). The CLI labels them Thinking (dim) and Assistant (green / Markdown).

Enable think on the request

The agent sends Ollama’s think field only when you set it (CLI or env). If you omit it, the server uses its own defaults—and some models then omit or change reasoning in the response.

You want CLI Environment
Reasoning on (typical Qwen / DeepSeek-style) --think true OLLAMA_AGENT_THINK=true or 1
Reasoning off --think false OLLAMA_AGENT_THINK=false or 0
GPT-OSS style levels --think low, medium, or high OLLAMA_AGENT_THINK=medium (example)

Examples:

OLLAMA_AGENT_THINK=true bundle exec ruby exe/ollama_agent ask -i
bundle exec ruby exe/ollama_agent ask -i --think true
# GPT-OSS: prefer a level, not only true/false
bundle exec ruby exe/ollama_agent ask --think medium "Your task"

Streaming vs one-shot (default)

Mode Flags What you see
One-shot (default) neither --stream nor OLLAMA_AGENT_STREAM=1 Each model round completes over HTTP; Thinking / Assistant are printed from the assembled message (including Gemma-style reasoning tags stripped from content when the API omits thinking).
Streaming --stream or OLLAMA_AGENT_STREAM=1 Reasoning streams in dim text under one Thinking line, then Assistant and the reply stream—similar to Cursor. Uses hooks[:on_thinking] on the ollama-client chat stream (see OllamaAgent::OllamaChatThinkingStreamPatch).
OLLAMA_AGENT_THINK=medium OLLAMA_AGENT_STREAM=1 bundle exec ruby exe/ollama_agent ask "Your task"

Note: Subscribing only to on_thinking does not enable the streaming chat path; the agent uses streaming when something listens for on_token (the console streamer registers both). See CHANGELOG 1.0.0 if you embed the library.

Display style (TTY)

By default OLLAMA_AGENT_THINKING_STYLE=compact: one Thinking header per ask run; later reasoning chunks in the same run are separated by blank lines only (including after tool rounds). OLLAMA_AGENT_THINKING_STYLE=framed repeats the full boxed banner per message. Thinking body is plain dim unless OLLAMA_AGENT_THINKING_MARKDOWN=1.

The CLI uses ANSI colors on a TTY (banner, prompt, patch prompts). Assistant replies use Markdown via tty-markdown when stdout is a TTY and NO_COLOR is unset. Disable Markdown with OLLAMA_AGENT_MARKDOWN=0; disable colors with NO_COLOR or OLLAMA_AGENT_COLOR=0.

If you see no Thinking block

  1. Set think explicitly—especially for GPT-OSS (low / medium / high).
  2. Confirm the model returns message.thinking (e.g. curl / ollama CLI against /api/chat with the same think value). If the API never sends thinking, the agent has nothing to show.
  3. Try streaming (--stream or OLLAMA_AGENT_STREAM=1) if you want live reasoning tokens.
  4. Embedded reasoning in content: Some templates (e.g. Gemma) put tags such as <|channel>thought<channel|> or <redacted_thinking></redacted_thinking> inside content. The agent strips those into Thinking when present (OllamaAgent::GemmaThoughtContentParser). If your model uses different delimiters, reasoning may stay inside the main reply until parsers are extended.

Ruby API

OllamaAgent::Runner.build(stream: true, think: "medium").run("Your task")

Custom subscribers can attach to hooks[:on_thinking] and hooks[:on_token] on the same Runner instance (see OllamaAgent::Streaming::Hooks).

Ollama Cloud

Ollama Cloud uses the same HTTP API as the local server, with HTTPS and a Bearer API key. The ollama-client gem sends Authorization: Bearer <api_key> when Ollama::Config#api_key is set (HTTPS is used when the URL scheme is https).

  1. Create a key at ollama.com/settings/keys.
  2. Point the agent at the cloud host and pass the key (same env names as ollama-client’s docs):
export OLLAMA_BASE_URL="https://ollama.com"
export OLLAMA_API_KEY="your_key"
export OLLAMA_AGENT_MODEL="gpt-oss:120b-cloud"   # example; pick a cloud model from `ollama list` / the catalog
# Reasoning for GPT-OSS: set a level (see "Reasoning / thinking output" above)
export OLLAMA_AGENT_THINK=medium
bundle exec ruby exe/ollama_agent ask "Your task"

Multi-Key Provider Credential Orchestration & Failover

To handle rate limits (RPM/TPM window exhaustion), daily quotas, or network timeouts when using Ollama Cloud (or other providers like OpenAI and Anthropic), you can configure a thread-safe, quota-aware Credential Pool with automatic reactive failover.

The pool can be configured dynamically via the Ruby API or auto-detected from environment variables.

Environment Auto-Detection

If no explicit credentials are passed in, ollama_agent automatically scans the environment for keys indexed 1 to 5 and initializes a CredentialPool:

  • Ollama Cloud: Configure keys OLLAMA_API_KEY_1 through OLLAMA_API_KEY_5. When any of these are present, requests are routed to "https://api.ollama.com" (aliased as "ollama_cloud").
  • OpenAI: Configure keys OPENAI_API_KEY_1 through OPENAI_API_KEY_5.
  • Anthropic: Configure keys ANTHROPIC_API_KEY_1 through ANTHROPIC_API_KEY_5.

Example for Ollama Cloud:

export OLLAMA_API_KEY_1="ollama_key_abc123"
export OLLAMA_API_KEY_2="ollama_key_def456"
bundle exec ruby exe/ollama_agent ask "Your task"
Ruby API Configuration

You can also pass a structured array of credential hashes to OllamaAgent::Runner.build:

runner = OllamaAgent::Runner.build(
  credentials: [
    {
      id: "ollama-cloud-primary",
      provider: "ollama_cloud",
      api_key: "ollama_...",
      weight: 2, # weighted round-robin priority (default: 1)
      limits: { rpm: 10, tpm: 10000, daily_tokens: 1000000 } # automatic quota tracking
    },
    {
      id: "openai-backup",
      provider: "openai",
      api_key: "sk-...",
      weight: 1
    }
  ]
)
runner.run("Your task")
Failover Behavior
  1. Weighted Round-Robin: Requests are balanced across healthy and available credentials.
  2. Quota Tracking: Daily token/request and RPM/TPM sliding windows are tracked locally.
  3. Reactive Failover: If a key encounters a rate limit (HTTP 429), temporary provider error (HTTP 5xx), or quota exhaustion, it is temporarily cooled down and the request is retried with the next available key.
  4. Permanent Disabling: If a key encounters an authentication failure (HTTP 401 or HTTP 403), it is permanently disabled to prevent dead-key hammering.

Environment

Variable Purpose
OLLAMA_BASE_URL Ollama API base URL (default from ollama-client: http://localhost:11434; use https://ollama.com for cloud)
OLLAMA_API_KEY API key for Ollama Cloud (https://ollama.com); optional for local HTTP
OLLAMA_AGENT_MODEL Model name (overrides default from ollama-client)
OLLAMA_AGENT_ROOT Project root for tools (list_files, read_file, etc.). Defaults to current working directory when unset (CLI never falls back to the gem install path).
OLLAMA_AGENT_DEBUG Set to 1 to print validation diagnostics on stderr
OLLAMA_AGENT_STRICT_ENV Set to 1 so invalid numeric env values (e.g. OLLAMA_AGENT_MAX_TURNS) raise ConfigurationError instead of falling back to defaults
OLLAMA_AGENT_MAX_TURNS Max chat rounds with tool calls (default: 64)
OLLAMA_AGENT_TIMEOUT HTTP read/open timeout in seconds for Ollama requests (default 120; use ask --timeout / -t to override per run)
OLLAMA_AGENT_PARSE_TOOL_JSON Set to 1 to run tools parsed from JSON lines in assistant text (fallback when the model does not emit native tool calls)
NO_COLOR Set (any value) to disable ANSI colors (see no-color.org)
OLLAMA_AGENT_COLOR Set to 0 to disable colors even on a TTY
OLLAMA_AGENT_MARKDOWN Set to 0 to disable Markdown formatting of assistant replies (plain text only)
OLLAMA_AGENT_THINKING_STYLE compact (default) = one Thinking label per run, blank lines between later reasoning chunks; framed = repeat full banner/rulers each message
OLLAMA_AGENT_THINKING_MARKDOWN Set to 1 to render thinking text with Markdown (muted); default is plain dim text
OLLAMA_AGENT_STREAM Set to 1 to stream tokens and reasoning to stdout (same as CLI --stream on ask / self_review / improve).
OLLAMA_AGENT_THINK Model thinking mode for compatible models: true / false, or high / medium / low (see ollama-client think:). Empty = omit (server default). GPT-OSS: use low / medium / high.
OLLAMA_AGENT_PATCH_RISK_MAX_DIFF_LINES Max changed-line count before a diff is treated as "large" for semi-auto patch risk (default 80)
OLLAMA_AGENT_INDEX_REBUILD Set to 1 to drop the cached Prism Ruby index before the next symbol search in this process
OLLAMA_AGENT_RUBY_INDEX_MAX_FILES Max .rb files to parse per index build (default 5000)
OLLAMA_AGENT_RUBY_INDEX_MAX_FILE_BYTES Skip Ruby files larger than this many bytes (default 512000)
OLLAMA_AGENT_RUBY_INDEX_MAX_LINES Max result lines for search_code class/module/method modes (default 200)
OLLAMA_AGENT_RUBY_INDEX_MAX_CHARS Max characters of index output per search (default 60000)
OLLAMA_AGENT_MAX_READ_FILE_BYTES Max bytes for a full read_file (no line range); larger files return an error (default 2097152, 2 MiB). Line-range reads stream and are not limited by this cap.
OLLAMA_AGENT_RG_PATH Absolute path to rg for search_code text mode (optional; otherwise first rg on PATH)
OLLAMA_AGENT_GREP_PATH Absolute path to grep fallback (optional; otherwise first grep on PATH)
OLLAMA_AGENT_INDEX_REBUILD The Prism index is rebuilt when this env value changes (e.g. unset → 1); it is not rebuilt on every tool call while it stays 1.
OLLAMA_AGENT_SKILLS 1/on/0/off — include bundled prompt skills (default on). Same as --no-skills on the CLI when off.
OLLAMA_AGENT_SKILLS_INCLUDE Comma-separated manifest ids to load (omit = all bundled). Example: ruby_style,rubocop,code_review.
OLLAMA_AGENT_SKILLS_EXCLUDE Comma-separated ids to skip from the bundled set.
OLLAMA_AGENT_SKILL_PATHS Extra .md files or directories, colon-separated (Unix PATH style). Directory entries load all *.md in sorted order. Merged with --skill-paths.
OLLAMA_AGENT_EXTERNAL_SKILLS 1/0 — include content from OLLAMA_AGENT_SKILL_PATHS (default on). Set 0 to use bundled-only without unsetting paths.

Prompt skills (bundled + optional paths)

The system prompt is the base agent instructions (AgentPrompt) plus optional Markdown sections. Bundled files live under lib/ollama_agent/prompt_skills/ and are listed in manifest.yml. Each file may use Cursor-style YAML frontmatter (------); the loader strips frontmatter before sending text to the model.

Manifest ids (in load order): clean_ruby, ruby_style, rubocop, solid, solid_ruby, design_patterns, rspec, rails_style, rails_best_practices, code_review, ollama_agent_patterns.

Bundled bodies were copied from Cursor SKILL.md files under ~/.cursor/skills/ (and ollama_agent_patterns from this repo’s .cursor/skills/ollama-agent-patterns). Re-copy when you update those skills upstream.

Many full skills can be large; use OLLAMA_AGENT_SKILLS_INCLUDE to trim for small-context models.

CLI flags (also available on ask, self_review, improve): --no-skills, --skill-paths 'path1:path2/dir'.

To run self_review / ask against the installed gem’s source (e.g. to hack on ollama_agent itself), pass an explicit root, for example --root "$(bundle show ollama_agent)" or a path to a git clone.

Orchestrator (external CLI agents)

Use the orchestrate command (or OLLAMA_AGENT_ORCHESTRATOR=1 with ask) to expose tools list_external_agents and delegate_to_agent. The Ollama model should gather context with read_file / search_code, list installed CLIs, then delegate a short task + context to an external agent (Claude Code, Gemini CLI, Codex, Cursor CLI, etc.). Definitions live in lib/ollama_agent/external_agents/default_agents.yml; override or extend via ~/.config/ollama_agent/agents.yml or OLLAMA_AGENT_EXTERNAL_AGENTS_CONFIG.

  • ollama_agent agents — print a table of configured agents and whether each binary is on PATH.
  • ollama_agent doctor — alias for agents.
  • delegate_to_agent runs a fixed argv (no shell) with cwd = project root; output is capped (OLLAMA_AGENT_DELEGATE_MAX_OUTPUT_BYTES, default 100k). Confirm each run unless -y.
  • Delegation audit logs: set OLLAMA_AGENT_DELEGATE_LOG=1 (or OLLAMA_AGENT_DEBUG=1) to emit a structured stderr line with agent id, argv, env keys (names only), exit code, and duration.
  • Adjust argv / version_argv in YAML to match your real CLI (vendor flags differ). If a tool has no stable non-interactive mode, do not expose it in the registry.
  • Tool contract version: OllamaAgent::ORCHESTRATOR_TOOLS_SCHEMA_VERSION.

Agentic tool calling (local environment tools)

Two built-in tools let the model observe its environment and compute precisely — the pattern described in Easy Agentic Tool Calling with Gemma 4 — adapted here for the Ruby runtime.

list_directory_contents

Lists files and subdirectories inside the current workspace. The model decides when to inspect the environment rather than guessing at what exists.

bundle exec ruby exe/ollama_agent ask \
  "What scripts are in the current folder, and which one looks like it handles CSV processing?"

All paths are sandboxed to OLLAMA_AGENT_ROOT. Traversal attempts (../../etc, absolute paths) are rejected before the filesystem is touched.

calculate

Evaluates an arithmetic expression using a Shunting-yard parser. The model offloads precision arithmetic instead of computing in its weights.

bundle exec ruby exe/ollama_agent ask \
  "What is the standard deviation of 12, 18, 23, 29, 31, 35, 44, 47 — compute it step by step using the formula."

Supports +, -, *, /, ** (right-associative), parentheses, and unary +/-. No eval.

Combining both tools

The model can chain the two tools in a single request — inspect the workspace first, then compute something about what it found:

bundle exec ruby exe/ollama_agent ask \
  "Look at the files in the current folder and tell me the total size in kilobytes, rounded to two decimal places."

Internally: the model calls list_directory_contents to get byte sizes, then calls calculate with the sum and division by 1024.

Ruby API

require "ollama_agent"

runner = OllamaAgent::Runner.build(root: Dir.pwd)

# Filesystem inspection
puts runner.run("What files are in lib/ollama_agent/tools/?")

# Arithmetic
puts runner.run("What is (412 + 1834 + 10786 + 88 + 2210) / 1024, rounded to 2 decimal places?")

See examples/agentic_tool_calling.rb for a runnable end-to-end demo.

Library usage (Ruby)

Most of this README is CLI-first (commands and environment variables above). The same capabilities exist as Ruby APIs—the Features list (file tools, self_review / improve, orchestrator, skills, etc.) is implemented under lib/ollama_agent/. For a layer diagram (agent → tools → hooks → session), see docs/ARCHITECTURE.md.

Coding agent — Runner (facade) — Stable entry for apps: OllamaAgent::Runner.build(root:, model:, stream:, session_id:, resume:, read_only:, orchestrator:, skills_enabled:, skill_paths:, audit:, max_tokens:, context_summarize:, stdin:, stdout:, ...) then #run(query). Optional stdin / stdout (default TTY) feed patch/write/delegate confirmations—use StringIO in tests or automation to avoid blocking on $stdin.gets. Exposes #hooks (Streaming::Hooks) for :on_token, :on_thinking (streamed reasoning when stream: true and the model supports it), :on_tool_call, :on_tool_result, :on_complete. Full keyword list: lib/ollama_agent/runner.rb.

Coding agent — Agent (direct)OllamaAgent::Agent.new(client:, root:, ...) when you inject an Ollama::Client (or test double), tweak options the CLI does not expose, or skip Runner.

Custom tools (coding agent)OllamaAgent::Tools.register("tool_name", schema: { ... }) { |args, root:, read_only:| ... } merges extra function definitions into the chat tool list; handlers run in the same sandbox as built-in tools.

Resilience and observability — Default client path uses Resilience::RetryMiddleware. Structured step logging: enable audit: true on Runner.build or OLLAMA_AGENT_AUDIT=1 (see Environment table). Context trimming: max_tokens / context_summarize on Runner.build.

Sessions — Pass session_id and optional resume: true on Runner.build to persist messages under .ollama_agent/sessions/ (Session::Store).

Self-improvement (sandbox) — CLI commands improve / self_review --mode automated wrap OllamaAgent::SelfImprovement (sandbox copy, tests, optional merge). Use the CLI for the full flow; the module is available for advanced integration.

ToolRuntime (alternate loop, optional) — Not used by the CLI. For non–file-edit agents (e.g. another gem that defines its own tools), a small JSON plan loop: the model returns one object per step {"tool":"name","args":{...}}, ToolRuntime::Registry resolves it, Executor runs your Tool subclasses, Memory holds short-term history. Use a swappable planner (anything implementing next_step(context:, memory:, registry:)) such as OllamaJsonPlanner (Ollama::Client#chat + JSON extraction). Step-by-step guide: docs/TOOL_RUNTIME.md.

  • Termination: a tool may return { "status" => "done" } to stop. Unknown tool names → OllamaAgent::ToolRuntime::InvalidPlanError; too many steps → MaxStepsExceeded. Loop#run returns the last tool result (same value as the final Executor#execute return).
  • Runnable examples: spec/ollama_agent/tool_runtime/.

Model and server: OllamaJsonPlanner uses the same default as the coding agent: OLLAMA_AGENT_MODEL if set, otherwise Ollama::Config.new.model (from ollama-client). The model must exist on whatever host you use. Use the same client setup as the CLI: OllamaAgent::OllamaConnection.apply_env_to_config copies OLLAMA_BASE_URL and OLLAMA_API_KEY into Ollama::Config. If you only run Ollama::Client.new(config: Ollama::Config.new) in irb, you stay on localhost while OLLAMA_AGENT_MODEL may still name a cloud model from the README cloud example → 404. Either apply apply_env_to_config (below) or unset the cloud model / pass model: "llama3.2".

require "ollama_agent"
require "ollama_client"

class EchoTool < OllamaAgent::ToolRuntime::Tool
  def name = "echo"

  def description = "Echo args"

  def schema = { "type" => "object", "properties" => { "msg" => { "type" => "string" } } }

  def call(args)
    return { "status" => "done", "echo" => args["msg"] } if args["msg"] == "bye"

    { "status" => "ok", "echo" => args["msg"] }
  end
end

registry = OllamaAgent::ToolRuntime::Registry.new([EchoTool.new])
memory = OllamaAgent::ToolRuntime::Memory.new
config = Ollama::Config.new
OllamaAgent::OllamaConnection.apply_env_to_config(config)
client = Ollama::Client.new(config: config)
planner = OllamaAgent::ToolRuntime::OllamaJsonPlanner.new(client: client)

last = OllamaAgent::ToolRuntime::Loop.new(
  planner: planner,
  registry: registry,
  executor: OllamaAgent::ToolRuntime::Executor.new,
  memory: memory,
  max_steps: 10
).run(context: "Say hello then echo bye to finish.")
# last => e.g. { "status" => "done", "echo" => "bye" }

Skills (deterministic JSON-contract pipelines)

Skills are single-purpose generators that bypass the tool-calling agent loop and return strict JSON validated against a schema. They are meant for pipelines that need predictable, parseable output — code review, refactoring suggestions, performance audits, debugging triage — without the unpredictability of free-form LLM prose.

Built-in skills:

  • architecture_refactor — restructure code without changing behavior
  • performance_optimizer — identify bottlenecks and emit optimized code
  • debug_engineer — root-cause a bug and propose a fix
  • feature_builder — design and implement a production-ready feature

Each skill:

  1. Renders a deterministic prompt (LLM temperature: 0 by default).
  2. Extracts the first balanced JSON object from the response (tolerates prose and ```json fences).
  3. Validates against the skill's SCHEMA and raises ContractError on mismatch.

CLI

# list registered skills
ollama_agent skill list

# run a single skill
ollama_agent skill run architecture_refactor --code-file lib/orders/manager.rb

# compose a pipeline; later skills receive earlier outputs merged in
ollama_agent skill pipeline architecture_refactor performance_optimizer \
  --code-file lib/exit_management.rb

Override the model with --model, OLLAMA_AGENT_SKILL_MODEL, or OLLAMA_AGENT_MODEL.

Ruby

result = OllamaAgent::Skills::ArchitectureRefactorer.new.call(
  code: File.read("lib/orders/manager.rb")
)
# => { folder_structure: [...], architecture_notes: "...", refactored_code: "..." }

OllamaAgent::Skills::Runner.new(
  [:architecture_refactor, :performance_optimizer]
).call(code: File.read("lib/exit_management.rb"))

Inject your own LLM client (anything responding to #generate(prompt) → String) in tests:

class FakeLlm
  def generate(_prompt)
    '{"bottlenecks": [], "optimizations": [], "optimized_code": "x"}'
  end
end

OllamaAgent::Skills::PerformanceOptimizer.new(llm: FakeLlm.new).call(code: "...")

By default skills run against the local Ollama provider (local-first, auditable). They go through OllamaAgent::Providers::Registry, so any registered provider (OpenAI, Anthropic, custom) is usable by passing your own LlmClient.

Troubleshooting

  • Use a tool-capable model — Set OLLAMA_AGENT_MODEL to a model that supports function/tool calling (e.g. a recent coder-tuned variant). If the model only prints {"name": "read_file", ...} in plain text, tools never run unless you enable OLLAMA_AGENT_PARSE_TOOL_JSON=1.
  • Malformed diffs — Headers must look like git diff: --- a/file then +++ b/file then a unified hunk line starting with @@ (not legacy --- N,M ----). Do not put commas after path tokens. The gem normalizes some mistakes and runs patch --dry-run before applying.
  • Request timeouts — The agent defaults to a 120s HTTP timeout (longer than ollama-client’s 30s). If you still hit Ollama::TimeoutError, raise it with OLLAMA_AGENT_TIMEOUT=300, bundle exec ruby exe/ollama_agent ask --timeout 300 "...", or -t 300. Ensure the variable name is exactly OLLAMA_AGENT_TIMEOUT (a leading typo such as vOLLAMA_AGENT_TIMEOUT is ignored).

How it works

  1. The CLI starts OllamaAgent::Agent, which loops on Ollama::Client#chat with tool definitions.
  2. Tools are executed in-process under a path sandbox (OLLAMA_AGENT_ROOT).
  3. search_code defaults to ripgrep/grep (mode omitted or text). For Ruby, use mode method, class, module, or constant to query a Prism parse index (built lazily on first use). read_file accepts optional start_line / end_line (1-based, inclusive) to read only part of a file.
  4. Patches are validated and checked with patch --dry-run before you confirm (unless -y).

Development

bundle exec rspec
bundle exec rubocop

Ongoing refactors (contributors): the Agent class is a thin façade over TurnLoop, ChatCoordinator, session/client wiring, and Tools::BuiltInSchemas so new behavior should land in those collaborators instead of growing monolithic methods. See CONTRIBUTING.md.

CI and RubyGems release

Repository secrets (Settings → Secrets and variables → Actions):

Secret Purpose
RUBYGEMS_API_KEY RubyGems API key with push scope
RUBYGEMS_OTP_SECRET Base32 secret for TOTP (RubyGems MFA); the workflow uses rotp to generate a one-time code for gem push

Release steps:

  1. Bump OllamaAgent::VERSION in lib/ollama_agent/version.rb and commit to main.
  2. Tag: git tag v1.0.0 (must match the version string) and git push origin v1.0.0.

License

MIT. See LICENSE.txt.