LlmCostTracker
Self-hosted LLM API cost tracking for Ruby and Rails apps.
Track token usage and estimated costs for OpenAI, Anthropic, and Google Gemini calls from Faraday-based Ruby clients. Store the data in your own database, tag calls by user or feature, and get budget alerts without adding an external SaaS or proxy.
Why?
Every Rails app integrating LLMs faces the same problem: you don't know how much AI is costing you until the invoice arrives. Full observability platforms like Langfuse and Helicone are powerful, but sometimes you just need a small Rails-native cost ledger that lives in your app database.
llm_cost_tracker takes a different approach:
- 🔌 Faraday-native — intercepts LLM HTTP responses without changing the response
- 🏠 Self-hosted — your data stays in your database
- 🧩 Client-light — works with raw Faraday and LLM gems that expose their Faraday connection
- 🏷️ Attribution-first — tag spend by feature, tenant, user, job, or environment
- 💸 Budget-aware — emit notifications and callbacks before spend surprises you
This gem is intentionally not a tracing platform, prompt CMS, eval system, or gateway. It focuses on the boring but valuable question: "What did this app spend on LLM APIs, and where did that spend come from?"
Installation
Add to your Gemfile:
gem "llm_cost_tracker"
For ActiveRecord storage (recommended for production):
bin/rails generate llm_cost_tracker:install
bin/rails db:migrate
Quick Start
Option 1: Faraday Middleware
If your LLM client uses Faraday, add the middleware to that connection:
conn = Faraday.new(url: "https://api.openai.com") do |f|
f.use :llm_cost_tracker, tags: { feature: "chat", user_id: current_user.id }
f.request :json
f.response :json
f.adapter Faraday.default_adapter
end
# Every supported LLM request through this connection is tracked
response = conn.post("/v1/responses", {
model: "gpt-5-mini",
input: "Hello!"
})
Option 2: Patch an existing client
Some LLM gems expose their Faraday connection. For example, with ruby-openai:
# config/initializers/openai.rb
OpenAI.configure do |config|
config.access_token = ENV["OPENAI_API_KEY"]
config.faraday do |f|
f.use :llm_cost_tracker, tags: { feature: "openai_default" }
end
end
If a client does not expose its HTTP connection, use manual tracking or register a custom parser around the HTTP layer you control.
Option 3: Manual tracking
For non-Faraday clients, track manually:
LlmCostTracker.track(
provider: :anthropic,
model: "claude-sonnet-4-6",
input_tokens: 1500,
output_tokens: 320,
cache_read_input_tokens: 1200,
feature: "summarizer",
user_id: current_user.id
)
Configuration
# config/initializers/llm_cost_tracker.rb
LlmCostTracker.configure do |config|
# Storage: :log (default), :active_record, or :custom
config.storage_backend = :active_record
# Default tags on every event
config. = { app: "my_app", environment: Rails.env }
# Monthly budget in USD
config.monthly_budget = 500.00
# Alert callback
config.on_budget_exceeded = ->(data) {
SlackNotifier.notify(
"#alerts",
"🚨 LLM budget exceeded! $#{data[:monthly_total].round(2)} / $#{data[:budget]}"
)
}
# Override pricing for custom/fine-tuned models (per 1M tokens)
config.pricing_overrides = {
"ft:gpt-4o-mini:my-org" => { input: 0.30, cached_input: 0.15, output: 1.20 }
}
end
Pricing is best-effort and based on public provider pricing for standard token usage. Providers change pricing frequently, and some features have extra charges or tiered pricing. Use pricing_overrides for fine-tunes, gateway-specific model IDs, enterprise discounts, batch pricing, long-context premiums, and any model this gem does not know yet.
Querying Costs (ActiveRecord)
# Today's total spend
LlmCostTracker::LlmApiCall.today.total_cost
# => 12.45
# Cost breakdown by model this month
LlmCostTracker::LlmApiCall.this_month.cost_by_model
# => { "gpt-4o" => 8.20, "claude-sonnet-4-6" => 4.25 }
# Cost by provider
LlmCostTracker::LlmApiCall.this_month.cost_by_provider
# => { "openai" => 8.20, "anthropic" => 4.25 }
# Daily cost trend
LlmCostTracker::LlmApiCall.daily_costs(days: 7)
# => { "2026-04-10" => 1.5, "2026-04-11" => 2.3, ... }
# Filter by feature
LlmCostTracker::LlmApiCall.by_tag("feature", "chat").this_month.total_cost
# Filter by user
LlmCostTracker::LlmApiCall.by_tag("user_id", "42").today.total_cost
# Custom date range
LlmCostTracker::LlmApiCall.between(1.week.ago, Time.current).cost_by_model
ActiveSupport::Notifications
Every tracked call emits an llm_request.llm_cost_tracker event:
ActiveSupport::Notifications.subscribe("llm_request.llm_cost_tracker") do |*, payload|
# payload =>
# {
# provider: "openai",
# model: "gpt-4o",
# input_tokens: 150,
# output_tokens: 42,
# total_tokens: 192,
# cost: {
# input_cost: 0.000375,
# cached_input_cost: 0.0,
# cache_read_input_cost: 0.0,
# cache_creation_input_cost: 0.0,
# output_cost: 0.00042,
# total_cost: 0.000795,
# currency: "USD"
# },
# tags: { feature: "chat", user_id: 42 },
# tracked_at: 2026-04-16 14:30:00 UTC
# }
StatsD.increment("llm.requests", tags: ["provider:#{payload[:provider]}"])
StatsD.histogram("llm.cost", payload[:cost][:total_cost])
end
Custom Storage Backend
LlmCostTracker.configure do |config|
config.storage_backend = :custom
config.custom_storage = ->(event) {
InfluxDB.write("llm_costs", {
values: { cost: event[:cost][:total_cost], tokens: event[:total_tokens] },
tags: { provider: event[:provider], model: event[:model] }
})
}
end
Adding a Custom Provider Parser
class DeepSeekParser < LlmCostTracker::Parsers::Base
def match?(url)
url.to_s.include?("api.deepseek.com")
end
def parse(request_url, request_body, response_status, response_body)
return nil unless response_status == 200
response = safe_json_parse(response_body)
usage = response["usage"]
return nil unless usage
{
provider: "deepseek",
model: response["model"],
input_tokens: usage["prompt_tokens"] || 0,
output_tokens: usage["completion_tokens"] || 0
}
end
end
# Register it
LlmCostTracker::Parsers::Registry.register(DeepSeekParser.new)
Supported Providers
| Provider | Auto-detected | Models with pricing |
|---|---|---|
| OpenAI | ✅ | GPT-5.2/5.1/5, GPT-5 mini/nano, GPT-4.1, GPT-4o, o1/o3/o4-mini |
| Anthropic | ✅ | Claude Opus 4.6/4.1/4, Sonnet 4.6/4.5/4, Haiku 4.5, Claude 3.x |
| Google Gemini | ✅ | Gemini 2.5 Pro/Flash/Flash-Lite, 2.0 Flash/Flash-Lite, 1.5 Pro/Flash |
| Any other | 🔧 | Via custom parser (see above) |
Supported endpoint families:
- OpenAI: Chat Completions, Responses, Completions, Embeddings
- Anthropic: Messages
- Google Gemini:
generateContentresponses withusageMetadata
How It Works
Your App → Faraday → [LlmCostTracker Middleware] → LLM API
↓
Parses response body
Extracts token usage
Calculates cost
↓
ActiveSupport::Notifications
ActiveRecord / Log / Custom
The middleware intercepts outgoing HTTP responses (not incoming Rails requests), parses the provider usage object, looks up pricing, and records the event. It never modifies requests or responses.
For streaming APIs, tracking depends on the final response body including provider usage data. If the client consumes server-sent events without exposing the final usage payload to Faraday, use manual tracking.
Development
git clone https://github.com/sergey-homenko/llm_cost_tracker.git
cd llm_cost_tracker
bundle install
bundle exec rspec
bundle exec rubocop
Contributing
Bug reports and pull requests are welcome on GitHub.
License
The gem is available as open source under the terms of the MIT License.