llmdb

A Ruby gem for talking to your PostgreSQL, MySQL, or Oracle database in plain language. The agent discovers the schema on demand, writes its own SQL, and executes it safely under a configurable permission model. Works with local LLMs (LM Studio, Ollama) and cloud providers (Anthropic, OpenAI, Gemini, etc.) through ruby_llm.

LLMDB.configure do |c|
  c.adapter, c.database, c.username, c.password = :postgresql, "shop", "ro_user", ENV["PG_PASS"]

  c.llm_provider = :anthropic
  c.llm_model    = "claude-opus-4-7"
  c.llm_api_key  = ENV["ANTHROPIC_API_KEY"]
end

session = LLMDB::Session.new
response = session.ask("How many orders did our top 5 customers place last month?")
puts response.content

Why this gem

The Ruby ecosystem already has Boxcars and Langchain.rb, both of which can talk to a database. llmdb is narrower and makes different trade-offs:

  • No ActiveRecord required. Schema and foreign-key relationships are read directly from the database via Sequel — you can point it at any database, including ones owned by other teams or other languages.
  • Schema is discovered, not declared. The agent uses tool calls (list_tables, describe_table, execute_query) to learn the schema as needed. No glossary, no preloaded prompt, no manual mapping. What it learns persists in the conversation context.
  • Multi-layered query safety. Three permission modes with real teeth — :read_only is enforced at the database engine level (SET TRANSACTION READ ONLY / PRAGMA query_only), not just by checking the first SQL token.
  • Provider-agnostic LLM. Drop-in compatible with anything ruby_llm supports: local models via Ollama or LM Studio, or cloud providers like Anthropic, OpenAI, Gemini.

If you want a full agent framework with vector search, RAG, and multi-step planning — use Boxcars or Langchain.rb. If you just want a focused, local-first natural-language SQL agent for one of three databases, this gem fits.

Installation

Add to your Gemfile:

gem "llmdb"

…and add the database driver you actually use (none are required by default):

gem "pg",        ">= 1.0"  # PostgreSQL
gem "mysql2",    ">= 0.5"  # MySQL
gem "ruby-oci8", ">= 2.2"  # Oracle (also requires Oracle Instant Client)

Then bundle install.

Quick start

require "llmdb"

LLMDB.configure do |c|
  # Database
  c.adapter  = :postgresql           # :postgresql | :mysql | :oracle
  c.host     = "localhost"
  c.port     = 5432
  c.database = "shop"
  c.username = "ro_user"
  c.password = ENV["PG_PASS"]

  # LLM
  c.llm_provider = :ollama           # :ollama, :openai, :anthropic, :gemini, ...
  c.llm_model    = "llama3.2"
  c.llm_api_base = "http://localhost:11434"
  c.llm_assume_model_exists = true   # local/custom models not in RubyLLM's registry
end

session = LLMDB::Session.new
session.ask("Which products had the highest revenue in Q1?")

LLMDB::Session.new accepts:

  • nothing — uses the global LLMDB.configuration
  • a Hash of options — built into a fresh Configuration
  • a LLMDB::Configuration instance — used directly

LLM provider examples

LM Studio (OpenAI-compatible local server)

LLMDB.configure do |c|
  # ...DB config...
  c.llm_provider = :openai
  c.llm_api_base = "http://localhost:1234/v1"
  c.llm_api_key  = "lm-studio"          # any non-empty string; LM Studio ignores it
  c.llm_model    = "qwen2.5-coder-32b-instruct"
  c.llm_assume_model_exists = true
end

Ollama

LLMDB.configure do |c|
  # ...DB config...
  c.llm_provider = :ollama
  c.llm_api_base = "http://localhost:11434"
  c.llm_model    = "llama3.2"
  c.llm_assume_model_exists = true
end

Anthropic

LLMDB.configure do |c|
  # ...DB config...
  c.llm_provider = :anthropic
  c.llm_api_key  = ENV["ANTHROPIC_API_KEY"]
  c.llm_model    = "claude-opus-4-7"
end

OpenAI

LLMDB.configure do |c|
  # ...DB config...
  c.llm_provider = :openai
  c.llm_api_key  = ENV["OPENAI_API_KEY"]
  c.llm_model    = "gpt-5"
end

If you'd rather configure RubyLLM yourself (Rails initializer, etc.) and skip the provider plumbing, omit llm_api_base and llm_api_key from the gem config — the gem will not touch RubyLLM.configure and will defer to whatever you set globally.

Permission modes

Three modes for query execution safety:

Mode SELECT/WITH/EXPLAIN INSERT/UPDATE/DELETE/DDL
:read_only (default) run blocked
:ask run confirmation required
:full run run
LLMDB.configure { |c| c.permission_mode = :ask }

:read_only — defense in depth

Two independent layers protect against writes:

  1. First-token whitelist. Anything that doesn't start with SELECT, WITH, or EXPLAIN fails fast with a clear SafetyError before ever reaching the database.
  2. DB-engine enforcement. Even if the token check is bypassed (e.g. a WITH x AS (DELETE FROM ...) SELECT ... CTE-DML in PostgreSQL, or a side-effect function call), the query runs inside a SET TRANSACTION READ ONLY transaction (PG/MySQL/Oracle) or a PRAGMA query_only = ON connection (SQLite). The DB engine itself refuses any write.

This means :read_only is genuinely safe — not just heuristically safe.

For belt-and-suspenders security, also use a read-only database role at the credential level. The gem can't undo what your DB user is allowed to do.

:ask — LLM-judged confirmation

Each query is classified by a stateless LLM judge (LLMDB::WriteClassifier) running in a fresh chat context with one job: decide whether the SQL writes. This catches CTE-wrapped DML, side-effect functions, and other constructs a first-token check would miss, without the cost of a database round-trip per query.

If the judge says "write", the configured confirm_callback is invoked. The default is an interactive TTY prompt:

[LLMDB] Pending non-read-only query:
  | UPDATE products SET price = price * 1.1 WHERE category = 'electronics'
[LLMDB] Execute? [y/N]

Override the confirmation flow for non-interactive contexts (web app, job queue, Slack bot, etc.):

LLMDB.configure do |c|
  c.permission_mode  = :ask
  c.confirm_callback = ->(sql) { MyApprovalService.ask(sql, user: current_user) }
end

The judge defaults to the same model as the agent — fine for local LLMs where calls are essentially free, but you may prefer a smaller dedicated classifier when using a large cloud model:

LLMDB.configure do |c|
  # ...main config using claude-opus-4-7...

  classifier_config = LLMDB::Configuration.new(
    adapter: c.adapter, database: c.database, username: c.username,
    llm_provider: :anthropic, llm_model: "claude-haiku-4-5"  # smaller, cheaper, faster
  )
  c.write_classifier = LLMDB::WriteClassifier.new(classifier_config)
end

Or supply any callable that takes a SQL string and returns a boolean — useful for plugging in a deterministic SQL parser like pg_query:

require "pg_query"
c.write_classifier = lambda do |sql|
  PgQuery.parse(sql).tree.stmts.any? { |s| s.stmt.node != :select_stmt }
end

The classifier fails closed: if the LLM call errors or times out, write? returns true so the user is asked rather than the query running silently.

:full — no restrictions

Anything goes. Use at your own risk, ideally with a database user whose grants already constrain what the agent can do.

System prompts

Two slots, layered:

LLMDB.configure do |c|
  # Optional — replaces the gem's built-in tool/schema-discovery instructions.
  # Most users never set this.
  c.default_system_prompt = nil

  # The prompt you'll typically set: project-specific context appended on top
  # of the default. Domain glossary, table conventions, language preference.
  c.system_prompt = <<~PROMPT
    Business glossary:
    - "customer" in business sense = `accounts` table (not `users`)
    - tables prefixed `tmp_` are ETL buffers — ignore them
    - always filter `accounts.deleted_at IS NULL` (soft delete)

    Respond concisely in English.
  PROMPT
end

Both are passed to RubyLLM as separate system messages (the second via with_instructions(prompt, append: true)), so the model sees them as distinct layers.

How it works

┌──────────┐  natural-language question  ┌────────────────┐
│  user    │────────────────────────────▶│ LLMDB::Session│
└──────────┘                             └────────┬────────┘
                                                  │
                                  ┌───────────────▼───────────────┐
                                  │ RubyLLM::Chat                 │
                                  │  + system prompts             │
                                  │  + tools: list_tables,        │
                                  │    describe_table,            │
                                  │    execute_query              │
                                  └───────────────┬───────────────┘
                                                  │ tool calls
                                  ┌───────────────▼───────────────┐
                                  │ LLMDB::Connection           │
                                  │  → permission_mode dispatch   │
                                  │  → confirm_callback           │
                                  │  → write_classifier (in :ask) │
                                  └───────────────┬───────────────┘
                                                  │
                                  ┌───────────────▼───────────────┐
                                  │ LLMDB::Adapters::*          │
                                  │  (Sequel-backed: PG/MySQL/    │
                                  │   Oracle)                     │
                                  └───────────────────────────────┘

The agent has no advance knowledge of the database. On each new question:

  1. It calls list_tables to see what's available.
  2. It calls describe_table for tables it intends to query, getting columns, types, and foreign-key relationships.
  3. It writes an SQL query and submits it to execute_query.
  4. It reasons about the result and answers the user.

Within a single conversation, schema knowledge accumulates in the chat history, so the agent doesn't re-discover tables it has already seen.

Configuration reference

LLMDB.configure do |c|
  # --- Database ---
  c.adapter  = :postgresql            # :postgresql | :mysql | :oracle
  c.host     = "localhost"
  c.port     = 5432                   # default depends on adapter
  c.database = "mydb"
  c.username = "user"
  c.password = "secret"               # nil for trust auth or socket connections

  # --- LLM (provider-agnostic) ---
  c.llm_provider = :openai            # any provider RubyLLM supports
  c.llm_model    = "gpt-5"
  c.llm_api_base = nil                # custom endpoint (LM Studio, vLLM, LiteLLM, ...)
  c.llm_api_key  = ENV["OPENAI_API_KEY"]
  c.llm_assume_model_exists = false   # set true for models outside RubyLLM's registry

  # --- Safety ---
  c.permission_mode  = :read_only     # :read_only | :ask | :full
  c.confirm_callback = nil            # callable(sql) -> bool, used in :ask mode
                                      # default: interactive TTY prompt
  c.write_classifier = nil            # callable(sql) -> bool, used in :ask mode
                                      # default: WriteClassifier (LLM-judge)
  c.max_rows         = 500            # cap on rows returned to the LLM per query

  # --- Prompts ---
  c.default_system_prompt = nil       # replaces built-in agent instructions
  c.system_prompt         = nil       # appended on top of the default
end

Architecture notes

  • Top-level constant is LLMDB (treating "DB" as an initialism). The gem registers a Zeitwerk inflector to make this work with the llmdb.rb filename.
  • Errors are loaded eagerly. lib/llmdb/errors.rb defines several constants (Error, ConfigurationError, SafetyError, QueryError, ConnectionError) directly under LLMDB, which Zeitwerk can't autoload by convention — the file is loader.ignored and required up front in lib/llmdb.rb.
  • No mutation of RubyLLM global config unless you ask for it. apply_llm_credentials is skipped entirely when both llm_api_base and llm_api_key are nil — useful when you configure RubyLLM elsewhere (e.g. a Rails initializer).
  • Tools are registered as instances, not classes. Each tool takes the Connection as a constructor argument; Session wires them up via chat.with_tools(...).

Limitations

  • The DB-level read-only enforcement is per-connection (SQLite) or per-transaction (PG/MySQL/Oracle). Other backends fall back to the first-token whitelist with a STDERR warning so the gap is visible in CI.
  • The LLM-judge classifier in :ask mode is non-deterministic. It's meaningfully more robust than a token check (it understands CTE-wrapped DML, side-effect functions, etc.), but ~99% accuracy on a small local model is still ~99%, not 100%. For zero-tolerance environments, use :read_only plus a read-only database role.
  • mysql2 does not currently compile on Ruby 4.0 (as of writing). Use Ruby 3.2–3.3 for MySQL until upstream catches up.
  • Oracle support requires ruby-oci8 and the Oracle Instant Client to be installed on the host.

Development

bundle install
bundle exec rspec               # 80+ examples, no DB required for unit tests
bundle exec rubocop

The test suite uses an in-memory SQLite database to exercise the adapter base class. Real PostgreSQL/MySQL/Oracle integration tests are not part of the default suite.

License

MIT.