File: README — Documentation by YARD 0.9.38

About

llm.rb is Ruby's most capable AI runtime.

It runs on Ruby's standard library by default. loads optional pieces only when needed, and offers a single runtime for providers, agents, tools, skills, MCP, streaming, files, and persisted state. As a bonus, llm.rb is also available for mruby.

It supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google Gemini, DeepSeek, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp. It also includes built-in ActiveRecord and Sequel support, plus concurrent tool execution through threads, tasks (via async gem), fibers, ractors, and fork (via xchan.rb gem).

Quick start

LLM::Context

The LLM::Context object is at the heart of the runtime. Almost all other features build on top of it. It is a low-level interface to a model, and requires tool execution to be managed manually. The LLM::Agent class is almost the same as LLM::Context but it manages tool execution for you - we'll cover agents next:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
ctx.talk "Hello world"

LLM::Agent

The LLM::Agent object is implemented on top of LLM::Context. It provides the same interface, but manages tool execution for you. It also has builtin features such as a loop guard that detects repeated tool call patterns, and another guard that detects infinite tool call loops. Both guards advise the model to change course rather than raise an error:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
agent = LLM::Agent.new(llm, stream: $stdout)
agent.talk "Hello world"

Tools

The LLM::Tool class can be subclassed to implement your own tools that can extend the abilities of a model:

class ReadFile < LLM::Tool
  name "read-file"
  description "Read a file"
  parameter :path, String, "The filename or path"
  required %i[path]

  def call(path:)
    {contents: File.read(path)}
  end
end

MCP

The LLM::MCP object lets llm.rb use tools provided by an MCP server. Those tools are exposed through the same runtime as local tools, so you can pass them to either LLM::Context or LLM::Agent. In this example, the MCP server runs over stdio and LLM::Context uses the same tool loop as local tools:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])

mcp.run do
  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
  ctx.talk "Use the available tools to inspect the environment."
  ctx.talk(ctx.wait(:call)) while ctx.functions?
end

Skills

Skills are reusable instructions loaded from a SKILL.md directory. They let you package behavior and tool access together, and they plug into the same runtime as tools, agents, and MCP. When a skill runs, llm.rb spawns a subagent with the skill instructions, access to only the tools listed in the skill, and recent conversation context:

---
name: release
description: Prepare a release
tools: ["search-docs", "git"]
---

## Task

Review the release state, summarize what changed, and prepare the release.

require "llm"

class ReleaseAgent < LLM::Agent
  model "gpt-5.4-mini"
  skills "./skills/release"
end

llm = LLM.openai(key: ENV["KEY"])
ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")

LLM::Stream

The LLM::Stream object lets you observe output and runtime events as they happen. You can subclass it to handle streamed content in your own application:

require "llm"

class Stream < LLM::Stream
  def on_content(content)
    $stdout << content
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new)
ctx.talk "Write a haiku about Ruby."

LLM::Stream (advanced)

The LLM::Stream object can also resolve tool calls while output is still streaming. In on_tool_call, you can spawn the tool, push the work onto the stream queue, and later drain it with wait:

require "llm"

class Stream < LLM::Stream
  def on_content(content)
    $stdout << content
  end

  def on_tool_call(tool, error)
    return queue << error if error
    queue << ctx.spawn(tool, :thread)
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: Stream.new, tools: [ReadFile])
ctx.talk "Read README.md and summarize the quick start."
ctx.talk(ctx.wait) while ctx.functions?

Concurrency

llm.rb can run tool work concurrently. This is useful when a model calls multiple tools and you want to resolve them in parallel instead of one at a time. On LLM::Agent, you can enable this with concurrency. Common options are :call for sequential execution, :thread, or :task for concurrent IO-bound work, and :ractor or :fork for more isolated CPU-bound work:

require "llm"

class Agent < LLM::Agent
  model "gpt-5.4-mini"
  tools ReadFile
  concurrency :thread
end

llm = LLM.openai(key: ENV["KEY"])
agent = Agent.new(llm, stream: $stdout)
agent.talk "Read README.md and CHANGELOG.md and compare them."

Serialization

The LLM::Context object can be serialized to JSON, which makes it suitable for storing in a file, a database column, or a Redis queue. The built-in ActiveRecord and Sequel plugins are built on top of this feature:

require "llm"

llm = LLM.openai(key: ENV["KEY"])

# Serialize a context
ctx1 = LLM::Context.new(llm)
ctx1.talk "Remember that my favorite language is Ruby"
string = ctx1.to_json

# Restore a context (from JSON)
ctx2 = LLM::Context.new(llm, stream: $stdout)
ctx2.restore(string:)
ctx2.talk "What is my favorite language?"

Installation

gem install llm.rb

Examples

REPL

This example uses LLM::Context directly for an interactive REPL.
See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)

loop do
  print "> "
  ctx.talk(STDIN.gets || break)
  puts
end

Multimodal: Local Files

In llm.rb, a prompt can be a string, an LLM::Prompt, or an array. When you use an array, each element can be plain text or a tagged object such as ctx.image_url(...), ctx.local_file(...), or ctx.remote_file(...). Those tagged objects carry the metadata the provider adapter needs to turn one Ruby prompt into the provider-specific multimodal request schema.

ctx.local_file(path) tags a local path as a :local_file object around LLM.File(path). If the model understands that file type, you can include it directly in the prompt array instead of uploading it first through a provider Files API:

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm)
ctx.talk ["Summarize this document.", ctx.local_file("README.md")]

Context Compaction

This example uses LLM::Context, LLM::Compactor, and LLM::Stream together so long-lived contexts can summarize older history and expose the lifecycle through stream hooks. This approach is inspired by General Intelligence Systems. The compactor can also use its own model: if you want summarization to run on a different model from the main context. token_threshold: accepts either a fixed token count or a percentage string like "90%", which resolves against the active model context window and triggers compaction once total token usage goes over that percentage. See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"

class Stream < LLM::Stream
  def on_compaction(ctx, compactor)
    puts "Compacting #{ctx.messages.size} messages..."
  end

  def on_compaction_finish(ctx, compactor)
    puts "Compacted to #{ctx.messages.size} messages."
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
  llm,
  stream: Stream.new,
  compactor: {
    token_threshold: "90%",
    retention_window: 8,
    model: "gpt-5.4-mini"
  }
)

Reasoning

This example uses LLM::Stream with the OpenAI Responses API so reasoning output is streamed separately from visible assistant output. See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"

class Stream < LLM::Stream
  def on_content(content)
    $stdout << content
  end

  def on_reasoning_content(content)
    $stderr << content
  end
end

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(
  llm,
  model: "gpt-5.4-mini",
  mode: :responses,
  reasoning: {effort: "medium"},
  stream: Stream.new
)
ctx.talk("Solve 17 * 19 and show your work.")

Request Cancellation

Need to cancel a stream? llm.rb has you covered through LLM::Context#interrupt!.
See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"
require "io/console"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)
worker = Thread.new do
  ctx.talk("Write a very long essay about network protocols.")
rescue LLM::Interrupt
  puts "Request was interrupted!"
end

STDIN.getch
ctx.interrupt!
worker.join

Sequel (ORM)

The plugin :llm integration wraps LLM::Context on a Sequel::Model and keeps tool execution explicit. Like the ActiveRecord wrappers, its built-in persistence contract is the serialized data column, while provider: resolves a real LLM::Provider instance and context: injects defaults such as model:.
See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"
require "net/http/persistent"
require "sequel"
require "sequel/plugins/llm"

class Context < Sequel::Model
  plugin :llm, provider: :set_provider, context: :set_context

  private

  def set_provider
    LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
  end

  def set_context
    {model: "gpt-5.4-mini", mode: :responses, store: false}
  end
end

ctx = Context.create
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content

ActiveRecord (ORM): acts_as_llm

The acts_as_llm method wraps LLM::Context and provides full control over tool execution. Its built-in persistence contract is one serialized data column. If your app has provider, model, or usage columns, provide them to llm.rb through provider: and context: instead of relying on reserved wrapper columns.

See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"
require "active_record"
require "llm/active_record"

class Context < ApplicationRecord
  acts_as_llm provider: :set_provider, context: :set_context

  private

  def set_provider
    LLM.openai(key: ENV["OPENAI_SECRET"])
  end

  def set_context
    {model: "gpt-5.4-mini", mode: :responses, store: false}
  end
end

ctx = Context.create!
ctx.talk("Remember that my favorite language is Ruby")
puts ctx.talk("What is my favorite language?").content

require "llm"
require "active_record"
require "llm/active_record"

class Context < ApplicationRecord
  acts_as_llm provider: :set_provider, context: :set_context

  # Optional application columns can still provide the provider and context.
  # For example, `provider_name` and `model_name` can be normal columns.

  private

  def set_provider
    LLM.public_send(provider_name, key: provider_key)
  end

  def set_context
    {model: model_name, mode: :responses, store: false}
  end
end

ActiveRecord (ORM): acts_as_agent

The acts_as_agent method wraps LLM::Agent and manages tool execution for you. Like acts_as_llm, its built-in persistence contract is one serialized data column. If your app has provider or model columns, provide them to llm.rb through your hooks and agent DSL.

See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"
require "active_record"
require "llm/active_record"

class Ticket < ApplicationRecord
  acts_as_agent provider: :set_provider, context: :set_context
  model "gpt-5.4-mini"
  instructions "You are a concise support assistant."
  tools SearchDocs, Escalate
  concurrency :thread

  private

  def set_provider
    LLM.openai(key: ENV["OPENAI_SECRET"])
  end

  def set_context
    {mode: :responses, store: false}
  end
end

ticket = Ticket.create!
puts ticket.talk("How do I rotate my API key?").content

require "llm"
require "active_record"
require "llm/active_record"

class Ticket < ApplicationRecord
  acts_as_agent provider: :set_provider, context: :set_context
  model "gpt-5.4-mini"
  instructions "You are a concise support assistant."

  private

  def set_provider
    LLM.public_send(provider_name, key: provider_key)
  end

  def set_context
    {mode: :responses, store: false}
  end
end

MCP

This example uses LLM::MCP over HTTP so remote GitHub MCP tools run through the same LLM::Context tool path as local tools. It expects a GitHub token in ENV["GITHUB_PAT"]. See the deepdive (web) or deepdive (markdown) for more examples.

require "llm"
require "net/http/persistent"

llm = LLM.openai(key: ENV["KEY"], persistent: true)
mcp = LLM::MCP.http(
  url: "https://api.githubcopilot.com/mcp/",
  headers: {"Authorization" => "Bearer #{ENV["GITHUB_PAT"]}"},
  persistent: true
)

mcp.start
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
ctx.talk("Pull information about my GitHub account.")
ctx.talk(ctx.wait(:call)) while ctx.functions?
mcp.stop

For scoped work, mcp.run do ... end is shorter and handles cleanup for you:

mcp = LLM::MCP.http(
  url: "https://api.githubcopilot.com/mcp/",
  headers: {"Authorization" => "Bearer #{ENV["GITHUB_PAT"]}"},
  persistent: true
)
mcp.run do
  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
  ctx.talk("Pull information about my GitHub account.")
  ctx.talk(ctx.wait(:call)) while ctx.functions?
end

Resources

deepdive (web) and deepdive (markdown) are the examples guide.
relay shows a real application built on top of llm.rb.
doc site has the API docs.

License

BSD Zero Clause
See LICENSE