Savvy OpenRouter

Ruby client for OpenRouter — unified access to chat models, embeddings, reranking, speech, transcription, video generation, OAuth, API keys, guardrails, workspaces, and related REST endpoints.

Installation

Add to your Gemfile:

gem "savvy_openrouter"

Or install the gem directly:

gem install savvy_openrouter

Configuration

Precedence is keyword arguments to SavvyOpenrouter::Client (highest), then YAML config, then environment variables (lowest).

Environment variables

Variable	Purpose
`OPENROUTER_API_KEY`	Bearer token (required for requests)
`OPENROUTER_BASE_URL`	API base (default `https://openrouter.ai/api/v1`)
`OPENROUTER_DEFAULT_MODEL`	Default `model` when omitted in request bodies
`OPENROUTER_HTTP_REFERER`	`HTTP-Referer` header (app attribution)
`OPENROUTER_APP_TITLE`	`X-Title` header

YAML config (optional)

If config/savvy_openrouter.yml or .savvy_openrouter.yml exists in the working directory, it is loaded automatically. Example:

api_key: "sk-or-v1-..."
default_model: "openai/gpt-4o-mini"
defaults:
  temperature: 0.7
  max_tokens: 4096
video_defaults:
  aspect_ratio: "16:9"
  resolution: "720p"
http_referer: "https://your-app.example.com"
app_title: "Your App"

# Responses API only (POST /responses) — plugins, tools, max_output_tokens, x_search_filter, etc.
# Use this instead of putting `plugins` under global `defaults` (which would also merge into chat/embeddings).
# See https://openrouter.ai/docs/api/reference/responses/web-search
# responses_defaults:
#   plugins:
#     - id: web
#       max_results: 5
#   max_output_tokens: 4096

API call logging (`api_call_log`)

Optional persistence of every outbound OpenRouter HTTP request made through this gem (JSON clients, raw/binary downloads, and streaming chat). Configure api_call_log in YAML or pass api_call_log: when building SavvyOpenrouter::Client.

It depends on Active Record (or any Ruby class you configure) exposing create!(attributes) — the usual Rails pattern. Define a migration for whatever columns you map (strings / integers / booleans / text / jsonb); avoid indexing huge raw payloads on Postgres without care.

Each entry under columns is <source_key>: <your_column>. If the logged payload includes source_key, it is copied into the row (after coercion). Reserved keys (model, columns, max_body_bytes, chat_attempts, responses_attempts) are never read from logged attrs. Any other source_key you whitelist can carry app-specific context (for example bill_forward_event_id), provided your code merges it via connection.with_call_context.

api_call_log:
  model: OpenRouterApiCallLog
  max_body_bytes: 65536
  # With chat_retries, log only the final HTTP attempt (one row per logical completion).
  # Use "all" (or omit) to persist every retry attempt.
  chat_attempts: final
  columns:
    method: http_method
    path: request_url
    status: response_status           # same as http_status when Faraday returned a response
    http_status: http_status
    success: success                  # boolean: 2xx HTTP
    duration_ms: duration_ms
    request_body: request_body
    response_body: response_body
    request_json: request_json        # structured body; coercion JSON-serializes for json/text columns
    response_json: response_json
    usage: usage                      # usage hash from JSON responses when present
    cost: cost                        # BigDecimal parsed from usage when present
    generation_id: generation_id    # from x-generation-id header or response id
    logical_model: logical_model    # model string from request body when present
    endpoint: endpoint              # resource endpoint label when set via call context
    error_class: error_class
    error_message: error_message
    streaming: streaming
    bill_forward_event_id: bill_forward_event_id   # example passthrough key

Call context: wrap outbound calls to attach keys merged into every log row for that scope:

client.connection.with_call_context(bill_forward_event_id: event.id) do
  client.chat.completions(...)
end

Set suppress_api_call_log: true in the context hash to skip persistence for that block (for example when the app performs its own higher-level logging / retries).

Logical failures after HTTP 200 (invalid structured JSON, validation): use client.record_api_call(attrs) (delegates to connection.record_manual_api_call). Attributes are merged with the current with_call_context stack.

Canonical source keys the gem may set include: method, path, status, http_status, duration_ms, request_body, response_body, request_json, response_json, usage, cost, generation_id, logical_model, endpoint, success, error_class, error_message (for failed JSON HTTP responses, from the API error.message when present), streaming, plus any keys you pass through with_call_context. Omit mappings you do not need. Set api_call_log: false (or omit model / columns) to disable.

Logging failures never raise into your app code. Large bodies are truncated; Authorization / sk-or-v1-* patterns in serialized bodies are redacted (still treat logs as sensitive).

Chat completion retries (`chat_retries`)

For client.chat.completions only (not streaming), you can retry when OpenRouter returns a successful HTTP 200 but the payload looks broken—common with free tiers (usage.completion_tokens == 0) or an empty assistant content—and on selected HTTP errors (429, 502, 500/501, 503).

Configure chat_retries in YAML or pass chat_retries: / completion_retries: to SavvyOpenrouter::Client. Retries are off unless max_attempts is greater than 1.

chat_retries:
  max_attempts: 4
  base_delay_ms: 400
  max_delay_ms: 8000
  exponential_backoff: true   # default true; set false for fixed delay
  jitter_ratio: 0.15          # 0–1, fraction of delay added randomly
  on:                           # optional overrides (default true for each unless set false)
    zero_completion_tokens: true
    empty_assistant_content: true
    rate_limit: true
    bad_gateway: true
    internal_server_error: true
    service_unavailable: true

After the last attempt, the gem returns the final response body (for 200s) or re-raises the last API error. completions_stream does not use this policy—handle streaming retries in your own code if needed.

Responses API retries (`responses_retries`)

For client.responses.create only, configure responses_retries in YAML or pass responses_retries: to SavvyOpenrouter::Client. Same timing options as chat_retries (max_attempts, base_delay_ms, …). When on.zero_output_tokens is true (default), the gem retries on successful HTTP 200 responses where usage.output_tokens is zero and status indicates an incomplete or empty generation (see ResponsesRetryPolicy).

With api_call_log, set responses_attempts: final to persist one row for the last HTTP attempt of a retry loop; all logs each attempt (default).

Structured output validation (`savvy_openrouter/patterns`)

Optional pure-Ruby checks that assistant / Responses text is non-empty parseable JSON when the request asked for json_object or json_schema:

require "savvy_openrouter/patterns"

SavvyOpenrouter::Patterns.validate_after_success!(
  endpoint: "chat_completions",
  request: request_body_hash,
  response: parsed_response_hash
)
# raises SavvyOpenrouter::StructuredOutputError (reason: :empty_content, :invalid_json, :no_choices)

Helpers: chat_structured_requested?, responses_structured_requested?, extract_chat_assistant_text, extract_responses_output_text, assert_parseable_json!.

Request plugins (`savvy_openrouter/request_plugins`)

Optional injection of OpenRouter plugins before chat.completions / responses.create:

require "savvy_openrouter/request_plugins"

SavvyOpenrouter::RequestPlugins.prepare_chat_body!(body) # PDF engine from config / default cloudflare-ai
SavvyOpenrouter::RequestPlugins.prepare_chat_body!(body, pdf_engine: "native") # e.g. Gemini native PDF
SavvyOpenrouter::RequestPlugins.prepare_responses_body!(body) # response-healing for structured Responses

YAML (or OPENROUTER_FILE_PARSER_PDF_ENGINE): file_parser_pdf_engine — native, cloudflare-ai, mistral-ocr, etc. Apps using OpenRouterService pass this through from the loaded SavvyOpenrouter::Client config automatically.

Install templates

Rails

rails generate savvy_openrouter:install

Plain Ruby / scripts

bundle exec savvy_openrouter install

Creates config/savvy_openrouter.yml from the bundled template.

Responses API: web search and plugins

The Responses API accepts parameters such as plugins (legacy web plugin), tools (recommended openrouter:web_search server tool for Chat Completions and Responses), max_output_tokens, and x_search_filter (xAI).

Do not put those keys in the global defaults hash, because defaults is merged into chat completions, embeddings, rerank, audio, etc. Instead use responses_defaults in YAML (or responses_defaults: when constructing SavvyOpenrouter::Client). Those keys are merged only into client.responses.create(...).

Example:

responses_defaults:
  plugins:
    - id: web
      max_results: 5
  max_output_tokens: 9000

Per-request arguments override the same keys from responses_defaults.

OpenRouter documents the legacy plugins: [{ id: "web" }] approach as deprecated in favor of the openrouter:web_search server tool via the tools array; you can still pass either shape through this gem as passthrough JSON.

Models API (`GET /models`)

GET /api/v1/models supports optional query parameters, including category (for example programming, roleplay, science), output_modalities (for example text), and supported_parameters. Pass them as keywords to client.models.list:

client.models.list(category: "programming", output_modalities: "text")

OpenRouter returns models in curated rank order within each filtered result set—that order is not “highest context_length wins.” To approximate the top free text model for a category, call first_ranked_free_text_model, which keeps API order and returns the first model whose pricing.prompt and pricing.completion both parse to zero:

client.models.first_ranked_free_text_model(category: "programming")
client.models.first_ranked_free_text_model(category: "roleplay")

Use a stored model id for normal traffic. Resolving the free model calls GET /models (one request). Each chat turn is POST /chat/completions (another). If you call first_ranked_free_text_model (or list and pick) on every user message, you pay two API calls per turn—list plus chat. Instead, resolve once (at deploy, in a Rake task, or on a long TTL), remember the returned id (environment variable, database, cache, or default_model in savvy_openrouter.yml), and pass that string to client.chat.completions(model: ...) for ongoing requests. Refresh the stored id when you want to pick up a new “top free” model after OpenRouter changes their list.

That heuristic stays aligned with OpenRouter’s listing as long as their ranking and pricing rows stay as they are; it is not a separate benchmark score from the JSON (there is no rating field on each model). For chat-specific knobs such as tools, tool_choice, and response_format (including JSON schema), pass them on client.chat.completions; global YAML defaults also merge into embeddings and other resources, so prefer per-call args for tool and schema defaults unless you only use chat endpoints.

Usage

require "savvy_openrouter"

client = SavvyOpenrouter::Client.new(api_key: ENV.fetch("OPENROUTER_API_KEY"))

# Chat completion (defaults from YAML/env merged into body)
response = client.chat.completions(
  messages: [{ role: "user", content: "Hello!" }]
)
puts response[:choices].first[:message][:content]

# Streaming (SSE); each yielded string is a JSON `data:` payload from OpenRouter
client.chat.completions_stream(messages: [{ role: "user", content: "Hi" }]) do |data_json|
  chunk = JSON.parse(data_json)
  print chunk.dig("choices", 0, "delta", "content")
end

# Embeddings
client.embeddings.create(model: "openai/text-embedding-3-small", input: "Hello world")

# Video: create (HTTP 202), poll, download binary
job = client.videos.create(model: "google/veo-3.1", prompt: "A calm ocean at dawn")
status = client.videos.poll_until(job[:id])
video_bytes = client.videos.download(job[:id]) if status[:status].to_s == "completed"

# Binary speech (TTS)
audio_bytes = client.audio.speech(model: "elevenlabs/...", input: "Hello")

# Responses API (beta)
client.responses.create(model: "openai/gpt-4o", input: "Hello")

# Discover top free model for a category (GET /models) — run rarely; persist the id, do not call per user message
top_free = client.models.first_ranked_free_text_model(category: "programming")
# e.g. save top_free[:id] to ENV or default_model, then:
client.chat.completions(model: top_free[:id], messages: [{ role: "user", content: "Hello!" }])

# Management APIs (typically require a management key)
client.api_keys.list
client.guardrails.list
client.workspaces.list

See the OpenRouter API reference for request/response shapes.

Errors

API failures raise subclasses of SavvyOpenrouter::ApiError (for example AuthenticationError, PaymentRequiredError, RateLimitError, BadGatewayError). Every error exposes #status_code and #response_body when available.

Development

bin/setup
bundle exec rake   # RSpec + RuboCop

Integration tests live in spec/integration/ and are tagged :integration. When OPENROUTER_API_KEY is set, they call the live API (WebMock allows net connect only for those examples). Smoke chat examples use the free model inclusionai/ring-2.6-1t:free; spec/integration/free_model_rank_spec.rb asserts curated “first free” model ids for programming and roleplay against the live models list (these examples can fail if OpenRouter changes ordering or pricing).

Contributing

Bug reports and pull requests are welcome on GitHub.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in this project's repositories is expected to follow the code of conduct.