Savvy OpenRouter

Ruby client for OpenRouter — unified access to chat models, embeddings, reranking, speech, transcription, video generation, OAuth, API keys, guardrails, workspaces, and related REST endpoints.

Installation

Add to your Gemfile:

gem "savvy_openrouter"

Or install the gem directly:

gem install savvy_openrouter

Configuration

Precedence is keyword arguments to SavvyOpenrouter::Client (highest), then YAML config, then environment variables (lowest).

Environment variables

Variable	Purpose
`OPENROUTER_API_KEY`	Bearer token (required for requests)
`OPENROUTER_BASE_URL`	API base (default `https://openrouter.ai/api/v1`)
`OPENROUTER_DEFAULT_MODEL`	Default `model` when omitted in request bodies
`OPENROUTER_HTTP_REFERER`	`HTTP-Referer` header (app attribution)
`OPENROUTER_APP_TITLE`	`X-Title` header

YAML config (optional)

If config/savvy_openrouter.yml or .savvy_openrouter.yml exists in the working directory, it is loaded automatically. Example:

api_key: "sk-or-v1-..."
default_model: "openai/gpt-4o-mini"
defaults:
  temperature: 0.7
  max_tokens: 4096
video_defaults:
  aspect_ratio: "16:9"
  resolution: "720p"
http_referer: "https://your-app.example.com"
app_title: "Your App"

# Responses API only (POST /responses) — plugins, tools, max_output_tokens, x_search_filter, etc.
# Use this instead of putting `plugins` under global `defaults` (which would also merge into chat/embeddings).
# See https://openrouter.ai/docs/api/reference/responses/web-search
# responses_defaults:
#   plugins:
#     - id: web
#       max_results: 5
#   max_output_tokens: 4096

API call logging (`api_call_log`)

Optional persistence of every outbound OpenRouter HTTP request made through this gem (JSON clients, raw/binary downloads, and streaming chat). Configure api_call_log in YAML or pass api_call_log: when building SavvyOpenrouter::Client.

It depends on Active Record (or any Ruby class you configure) exposing create!(attributes) — the usual Rails pattern. Define a migration for whatever columns you map (strings / integers / booleans / text); avoid indexing huge raw payloads on Postgres without care.

# Optional — persist each outbound HTTP exchange for debugging (Faraday JSON + raw + SSE streams)
api_call_log:
  model: OpenRouterApiCallLog
  max_body_bytes: 65536
  columns:
    method: http_method          # GET, POST, …
    path: request_url            # full URL including query string when present
    status: response_status      # integer HTTP status (nil on transport failure before response)
    duration_ms: duration_ms     # float milliseconds
    request_body: request_body   # JSON-ish text; secrets redacted; truncated to max_body_bytes
    response_body: response_body # same treatment
    error_class: error_class     # nil when Faraday returned a response
    error_message: error_message # transport errors or truncated exception message
    streaming: streaming         # true for chat SSE streams

Canonical keys on the left (method, path, …) are fixed by the gem; right-hand names are your database columns. Omit mappings you do not need. Set api_call_log: false (or omit model / columns) to disable.

Logging failures never raise into your app code. Large bodies are truncated; Authorization / sk-or-v1-* patterns in serialized bodies are redacted (still treat logs as sensitive).

Chat completion retries (`chat_retries`)

For client.chat.completions only (not streaming), you can retry when OpenRouter returns a successful HTTP 200 but the payload looks broken—common with free tiers (usage.completion_tokens == 0) or an empty assistant content—and on selected HTTP errors (429, 502, 500/501, 503).

Configure chat_retries in YAML or pass chat_retries: / completion_retries: to SavvyOpenrouter::Client. Retries are off unless max_attempts is greater than 1.

chat_retries:
  max_attempts: 4
  base_delay_ms: 400
  max_delay_ms: 8000
  exponential_backoff: true   # default true; set false for fixed delay
  jitter_ratio: 0.15          # 0–1, fraction of delay added randomly
  on:                           # optional overrides (default true for each unless set false)
    zero_completion_tokens: true
    empty_assistant_content: true
    rate_limit: true
    bad_gateway: true
    internal_server_error: true
    service_unavailable: true

After the last attempt, the gem returns the final response body (for 200s) or re-raises the last API error. completions_stream does not use this policy—handle streaming retries in your own code if needed.

Install templates

Rails

rails generate savvy_openrouter:install

Plain Ruby / scripts

bundle exec savvy_openrouter install

Creates config/savvy_openrouter.yml from the bundled template.

Responses API: web search and plugins

The Responses API accepts parameters such as plugins (legacy web plugin), tools (recommended openrouter:web_search server tool for Chat Completions and Responses), max_output_tokens, and x_search_filter (xAI).

Do not put those keys in the global defaults hash, because defaults is merged into chat completions, embeddings, rerank, audio, etc. Instead use responses_defaults in YAML (or responses_defaults: when constructing SavvyOpenrouter::Client). Those keys are merged only into client.responses.create(...).

Example:

responses_defaults:
  plugins:
    - id: web
      max_results: 5
  max_output_tokens: 9000

Per-request arguments override the same keys from responses_defaults.

OpenRouter documents the legacy plugins: [{ id: "web" }] approach as deprecated in favor of the openrouter:web_search server tool via the tools array; you can still pass either shape through this gem as passthrough JSON.

Models API (`GET /models`)

GET /api/v1/models supports optional query parameters, including category (for example programming, roleplay, science), output_modalities (for example text), and supported_parameters. Pass them as keywords to client.models.list:

client.models.list(category: "programming", output_modalities: "text")

OpenRouter returns models in curated rank order within each filtered result set—that order is not “highest context_length wins.” To approximate the top free text model for a category, call first_ranked_free_text_model, which keeps API order and returns the first model whose pricing.prompt and pricing.completion both parse to zero:

client.models.first_ranked_free_text_model(category: "programming")
client.models.first_ranked_free_text_model(category: "roleplay")

Use a stored model id for normal traffic. Resolving the free model calls GET /models (one request). Each chat turn is POST /chat/completions (another). If you call first_ranked_free_text_model (or list and pick) on every user message, you pay two API calls per turn—list plus chat. Instead, resolve once (at deploy, in a Rake task, or on a long TTL), remember the returned id (environment variable, database, cache, or default_model in savvy_openrouter.yml), and pass that string to client.chat.completions(model: ...) for ongoing requests. Refresh the stored id when you want to pick up a new “top free” model after OpenRouter changes their list.

That heuristic stays aligned with OpenRouter’s listing as long as their ranking and pricing rows stay as they are; it is not a separate benchmark score from the JSON (there is no rating field on each model). For chat-specific knobs such as tools, tool_choice, and response_format (including JSON schema), pass them on client.chat.completions; global YAML defaults also merge into embeddings and other resources, so prefer per-call args for tool and schema defaults unless you only use chat endpoints.

Usage

require "savvy_openrouter"

client = SavvyOpenrouter::Client.new(api_key: ENV.fetch("OPENROUTER_API_KEY"))

# Chat completion (defaults from YAML/env merged into body)
response = client.chat.completions(
  messages: [{ role: "user", content: "Hello!" }]
)
puts response[:choices].first[:message][:content]

# Streaming (SSE); each yielded string is a JSON `data:` payload from OpenRouter
client.chat.completions_stream(messages: [{ role: "user", content: "Hi" }]) do |data_json|
  chunk = JSON.parse(data_json)
  print chunk.dig("choices", 0, "delta", "content")
end

# Embeddings
client.embeddings.create(model: "openai/text-embedding-3-small", input: "Hello world")

# Video: create (HTTP 202), poll, download binary
job = client.videos.create(model: "google/veo-3.1", prompt: "A calm ocean at dawn")
status = client.videos.poll_until(job[:id])
video_bytes = client.videos.download(job[:id]) if status[:status].to_s == "completed"

# Binary speech (TTS)
audio_bytes = client.audio.speech(model: "elevenlabs/...", input: "Hello")

# Responses API (beta)
client.responses.create(model: "openai/gpt-4o", input: "Hello")

# Discover top free model for a category (GET /models) — run rarely; persist the id, do not call per user message
top_free = client.models.first_ranked_free_text_model(category: "programming")
# e.g. save top_free[:id] to ENV or default_model, then:
client.chat.completions(model: top_free[:id], messages: [{ role: "user", content: "Hello!" }])

# Management APIs (typically require a management key)
client.api_keys.list
client.guardrails.list
client.workspaces.list

See the OpenRouter API reference for request/response shapes.

Errors

API failures raise subclasses of SavvyOpenrouter::ApiError (for example AuthenticationError, PaymentRequiredError, RateLimitError, BadGatewayError). Every error exposes #status_code and #response_body when available.

Development

bin/setup
bundle exec rake   # RSpec + RuboCop

Integration tests live in spec/integration/ and are tagged :integration. When OPENROUTER_API_KEY is set, they call the live API (WebMock allows net connect only for those examples). Smoke chat examples use the free model inclusionai/ring-2.6-1t:free; spec/integration/free_model_rank_spec.rb asserts curated “first free” model ids for programming and roleplay against the live models list (these examples can fail if OpenRouter changes ordering or pricing).

Contributing

Bug reports and pull requests are welcome on GitHub.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in this project's repositories is expected to follow the code of conduct.