sql-chatbot-rails

AI chatbot for any Rails app — auto-discovers schema, indexes code, executes SQL, streams answers via chat widget.

Zero configuration required. Drop the gem in, mount the engine, embed the widget script tag. The chatbot reads your PostgreSQL schema, indexes app/ and lib/, detects routes, and answers natural-language questions about your data.


Quick Start

# Gemfile
gem 'sql-chatbot-rails', '~> 1.0', require: 'sql_chatbot_rails'
bundle install
rails generate sql_chatbot:install

The generator writes config/initializers/sql_chatbot.rb and adds mount SqlChatbot::Engine, at: "/chatbot" to your routes.

Set your API key, then add the widget script tag to app/views/layouts/application.html.erb:

<%= javascript_include_tag "/chatbot/widget.js" %>

Boot the app — a chat bubble appears in the bottom-right corner.


Configuration

config/initializers/sql_chatbot.rb:

SqlChatbot.configure do |c|
  # === LLM ===
  c.llm_provider = "openai"                  # or "groq", "ollama"
  c.llm_api_key  = ENV["OPENAI_API_KEY"]
  # c.llm_model    = "gpt-4o-mini"
  # c.llm_base_url = "https://api.openai.com/v1"

  # === Auth ===
  c.secret = ENV["CHATBOT_SECRET"]           # cookie + Bearer auth

  # === Code indexing ===
  c.code_paths = ["./app", "./lib"]

  # === Domain hints ===
  # c.custom_context = "status=3 means Deleted; always exclude deleted rows in lists"

  # === Row-level filters injected into every grammar SELECT ===
  # c.default_filters = { "*.status" => "!= 3" }

  # === Custom entity aliases ===
  # c.aliases = { "customer" => "account_user" }

  # === Cross-origin (separate frontend repo) ===
  # c.allowed_origins = ["https://app.example.com"]
  # c.token_lifetime  = 900   # JWT TTL in seconds (default: 15 min)
  # c.token_secret    = ENV["CHATBOT_TOKEN_SECRET"]   # auto-generated if nil

  # === Grammar engine (deterministic SQL path) ===
  # c.grammar_enabled              = true
  # c.grammar_confidence_threshold = 0.7
end

All configuration keys

Key Type Default Description
llm_provider String "openai" "openai", "groq", "ollama"
llm_api_key String nil Falls back to LLM_API_KEY / OPENAI_API_KEY / GROQ_API_KEY env
llm_model String provider preset gpt-4o-mini, llama-3.3-70b-versatile, llama3.1:8b
llm_base_url String provider preset OpenAI-compatible base URL
secret String nil Bearer-token / cookie auth secret
code_paths Array<String> ["./app"] Directories to index
custom_context String nil Free-form domain notes injected into the SQL prompt
default_filters Hash {} { "table.column" => "SQL fragment" } injected into every grammar SELECT
aliases Hash {} { "user-word" => "entity-name" } overrides auto-detected aliases
allowed_origins Array<String> nil Cross-origin CORS allowlist (returned + JWT-protected)
token_lifetime Integer 900 JWT TTL in seconds for cross-origin
token_secret String random JWT signing secret (set explicitly across replicas)
grammar_enabled Bool true Toggle the deterministic-SQL fast path
grammar_confidence_threshold Float 0.7 Minimum grammar match score before falling back to LLM

Environment variables

Var Purpose
OPENAI_API_KEY / GROQ_API_KEY / LLM_API_KEY API key
LLM_BASE_URL / LLM_MODEL Provider overrides
CHATBOT_SECRET Auth secret
CHATBOT_TOKEN_SECRET JWT signing secret for cross-origin

Setup by architecture

A. Rails monolith (Rails serves the HTML)

The default. Follow Quick Start above. The widget loads from the same origin as the page, cookie auth works without extra config.

B. Rails API + JS SPA on a different origin

Use cross-origin mode. The widget exchanges a same-origin token from your SPA's API gateway for a short-lived JWT, no third-party cookies needed.

SqlChatbot.configure do |c|
  c.llm_api_key     = ENV["OPENAI_API_KEY"]
  c.secret          = ENV["CHATBOT_SECRET"]
  c.allowed_origins = ["https://app.example.com"]
  c.token_lifetime  = 900
  c.code_paths      = ["./app", "./lib"]
end

In the SPA, load the widget from the API origin:

<script src="https://api.example.com/chatbot/widget.js"></script>

C. Microservices (multiple Rails APIs, one DB each)

Mount the engine in each service. Each instance introspects its own database; the widget script tag in any front-end can target whichever API it asks questions of. Use the same token_secret if you want cross-replica JWT validation.


Domain hints (when auto-detection isn't enough)

custom_context — free-form natural language

c.custom_context = <<~CTX
  status=3 means Deleted, always exclude in lists.
  is_admin true means staff, false means customer.
  invoices.amount is in cents, divide by 100 for dollars.
CTX

The string is injected verbatim into the SQL-generation prompt. Use this for quirks that the schema can't express on its own.

default_filters — structured row-level convention

Applies to every grammar-generated SELECT:

c.default_filters = {
  "*.status"            => "!= 3",       # any table with a status column
  "users.deleted_at"    => "IS NULL",    # specific column
  "invoices.cancelled_at" => "IS NULL",
}

Keys are "table.column" or "*.column". Values are SQL fragments placed after the qualified column reference.

aliases — map user words to entity names

When users say a word your tables don't use:

c.aliases = {
  "customer"  => "account_user",   # Saleor calls them account_user
  "customers" => "account_user",
  "agent"     => "user",           # Chatwoot agents are users with a role
  "agents"    => "user",
}

Always wins over auto-detected aliases on conflict.


Code indexing

Powers route detection (for navigation answers) and enum/constant discovery (Rails enum, Active Storage variants, scopes, business-rule constants).

Field Default
code_paths ["./app"]

Recommended for Rails: ["./app", "./lib", "./config"].

The indexer skips vendor, node_modules, tmp, log, .git. Maximum 2000 files.

Re-index after deploys: POST /chatbot/api/refresh.


Schema introspection

Auto-detects on boot:

  • Soft-delete columnsdeleted_at, discarded_at, archived_at, removed_at (Paranoia, Discard)
  • Polymorphic associations*_type VARCHAR + *_id BIGINT pairs
  • Lookup tables — small (<50 row) tables referenced by FK
  • PG enum types — native CREATE TYPE … AS ENUM
  • CHECK-constraint enumsCHECK(col IN (…)) patterns

Rails-level enum declarations are picked up by the code indexer.


Authentication

When c.secret is set:

  1. The widget bundle (/chatbot/widget.js) sets a chatbot_token cookie on load.
  2. API endpoints (/chatbot/api/ask, /chatbot/api/refresh) require the cookie or Authorization: Bearer <secret>.
  3. /chatbot/api/health stays open.

Without c.secret, the chatbot API is open to anyone — fine for development, not for production.


API endpoints

When mounted at /chatbot:

Endpoint Method Description
/chatbot/widget.js GET Serves the widget bundle
/chatbot/api/ask POST SSE-streamed chat endpoint
/chatbot/api/health GET { status, tables, code_files }
/chatbot/api/refresh POST Re-introspect schema + re-index code

Security

  • Every query runs inside BEGIN; SET TRANSACTION READ ONLY; … — writes are physically impossible.
  • DROP, DELETE, TRUNCATE, ALTER, INSERT, UPDATE, GRANT, CREATE, COMMENT blocked at parse time.
  • pg_catalog and information_schema blocked from generated queries.
  • Sensitive columns (password, secret, api_key, ssn, credit_card, etc.) filtered out before the schema is sent to the LLM.

Requirements

  • Rails >= 6.0
  • Ruby >= 2.7
  • PostgreSQL
  • An LLM provider: OpenAI, Groq, or Ollama (local)

License

MIT