RailsHealthChecks
A Rails engine that adds production-grade health check endpoints to any Rails app. Goes well beyond the built-in /up endpoint with 11 built-in checks, parallel execution, structured JSON responses, Prometheus metrics, and a clean configuration DSL.
Built-in checks: database · cache · Redis · SMTP · Sidekiq · SolidQueue · GoodJob · Resque · disk · memory · HTTP
Key features:
- Two-tier endpoints:
/live(liveness — process only) and/ready(readiness — all deps) prevent cascade failures in Kubernetes and behind load balancers - Parallel check execution via
Concurrent::Future— response time bounded by the slowest check, not the sum - Result caching (
config.cache_duration) to absorb high-frequency probe traffic - Prometheus text exposition at
GET /health/metrics(always HTTP 200) - Check groups (
config.group) expose subsets at/health/:group - Per-environment toggling, boot-time validation, and bearer token / IP / custom auth
rails generate rails_health_checks:initializerscaffolds a fully-commented config file- Drop-in replacement for OkComputer — see MIGRATING_FROM_OKCOMPUTER.md
Table of Contents
- Upgrading
- Installation
- Rack Applications
- Endpoints
- Configuration
- Authentication
- Built-in Checks
- Notifications
- Prometheus Metrics
- Result Caching
- Per-Environment Toggling
- Check Groups
- Custom Checks
- Migrating from OkComputer
- Performance
- Contributing
- License
Upgrading
v1.1.x → v1.2.x — breaking change to /live
GET /health/liveno longer runs dependency checks.
Prior to v1.2.0, /live ran all configured checks (database, Redis, etc.) and returned 503 if any failed. This was readiness behaviour under a liveness name and is the root cause of the cascade failure footgun described below.
What changed: /live now returns 200 OK whenever the Ruby process is alive, regardless of dependency state. Authentication is also skipped on this endpoint so Kubernetes and load balancer probes work without credentials.
What to do: If you were relying on /live to verify dependencies, switch to the new /health/ready endpoint. No configuration changes required.
# Before (was running dependency checks — now only liveness)
GET /health/live → 200 if process alive (deps ignored)
# New endpoint for dependency checks
GET /health/ready → 200 if all deps pass, 503 if any fail
Installation
Add to your Gemfile:
gem "rails_health_checks"
Then run:
bundle install
Mount the engine in config/routes.rb:
mount RailsHealthChecks::Engine => "/health"
Rack Applications
RailsHealthChecks::Rack::App is a mountable Rack app that exposes the same endpoints without requiring ActionDispatch or Rails routing. It is opt-in — the Rails engine is unaffected.
Setup
Add to your Gemfile (the gem already lists rails >= 8.0 as a dependency, so activesupport and concurrent-ruby are available):
gem "rails_health_checks"
Require and mount the Rack app alongside your existing app:
# config.ru
require "rails_health_checks"
require "rails_health_checks/rack/app"
RailsHealthChecks.configure do |config|
config.checks = [:disk, :memory, :redis]
config.redis_url = ENV["REDIS_URL"]
end
map "/health" do
run RailsHealthChecks::Rack::App
end
run MyApp
Sinatra
require "rails_health_checks/rack/app"
class MyApp < Sinatra::Base
use Rack::URLMap, "/health" => RailsHealthChecks::Rack::App
end
Roda
require "rails_health_checks/rack/app"
class MyApp < Roda
plugin :multi_run
run "/health", RailsHealthChecks::Rack::App
end
Available endpoints
The routes are identical to the Rails engine, relative to the mount point:
| Endpoint | Format | Use case |
|---|---|---|
GET/HEAD / |
JSON | Full dependency health (monitoring dashboards) |
GET/HEAD /live |
Plain text | Liveness probe — process only, no deps |
GET/HEAD /ready |
Plain text | Readiness probe — all configured dependency checks |
GET /metrics |
Prometheus text | Prometheus scraping |
GET /:group |
JSON | Scoped check group |
Framework-agnostic vs Rails-coupled checks
Checks that depend on Rails internals require those libraries to be present in the stack. Checks that use only stdlib or standalone gems work in any Rack context:
| Check | Works without Rails? |
|---|---|
:disk |
Yes |
:memory |
Yes |
:http |
Yes |
:redis |
Yes (requires redis gem) |
:smtp |
Yes (reads ActionMailer config if available, otherwise requires config.smtp_address) |
:database |
Requires ActiveRecord |
:cache |
Requires Rails.cache |
:sidekiq |
Requires Sidekiq |
:solid_queue |
Requires SolidQueue |
:good_job |
Requires GoodJob |
:resque |
Requires Resque |
Per-environment toggling in Rack
config.disable :check, in: :env compares against Rails.env in a Rails app. In a non-Rails Rack app it reads ENV["RACK_ENV"] instead (defaulting to "production" if unset):
config.disable :disk, in: :test # compares RACK_ENV when Rails is not defined
Authentication in Rack
All three authentication strategies work identically. When using the custom block strategy, the argument is a Rack::Request instead of ActionDispatch::Request:
RailsHealthChecks.configure do |config|
config.authenticate { |request| request.env["HTTP_X_INTERNAL"] == "true" }
end
Token and IP allowlist strategies are unchanged.
Endpoints
| Endpoint | Runs checks? | Format | Use case |
|---|---|---|---|
GET /health/live |
No — process only | Plain text | Kubernetes livenessProbe, load balancer health check |
GET /health/ready |
Yes — all configured deps | Plain text | Kubernetes readinessProbe, external uptime monitors |
GET /health |
Yes — all configured deps | JSON | Monitoring dashboards, alerting pipelines |
GET /health/metrics |
Yes — all configured deps | Prometheus text | Prometheus / OpenMetrics scraping |
GET /health/:group |
Yes — named subset | JSON | Scoped group (e.g. /health/workers) |
/health/live, /health/ready, and /health also respond to HEAD requests.
HTTP status: 200 OK when all checks pass, 503 Service Unavailable when any check fails (except /metrics which always returns 200, and /live which always returns 200).
Liveness vs. Readiness — why two tiers?
Using a single health endpoint for both load balancer checks and dependency monitoring is a cascade failure footgun. Here is the exact failure chain:
- Your database has a 30-second blip
- All running pods probe
/health/ready→ all return503 - The load balancer removes every pod from rotation simultaneously
- Traffic has nowhere to go — the app is fully down
- If the same endpoint drives
livenessProbe, Kubernetes begins restarting every pod - Restarting pods reconnect to the still-blipping database, fail again, restart again
- What was a 30-second DB hiccup is now a multi-minute outage driven by a thundering herd of pod restarts
The fix is to separate the two concerns:
| Endpoint | Question it answers | Correct probe |
|---|---|---|
/health/live |
Is the process running and responsive? | livenessProbe, LB health check |
/health/ready |
Are all dependencies reachable? | readinessProbe, uptime monitor |
Liveness (/health/live) — returns 200 OK as long as the Ruby process responds. No dependency checks run. Authentication is skipped so Kubernetes and load balancers work without credentials. When this fails, k8s restarts the pod because the process itself is stuck or crashed.
Readiness (/health/ready) — runs all configured dependency checks. Returns 503 if any check fails. When this fails, k8s stops routing traffic to the pod but leaves it running. The pod rejoins rotation automatically once dependencies recover — no restart, no thundering herd.
Deep JSON (/health) — same dependency checks as /ready, returned as structured JSON with per-check status and latency. Use for monitoring dashboards, alerting, or anywhere you need machine-readable detail. Do not use for liveness or readiness probes.
Kubernetes wiring
containers:
- name: web
ports:
- containerPort: 3000
livenessProbe:
httpGet:
path: /health/live # process-only — DB blip does NOT restart this pod
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3 # restarts only if the process stops responding entirely
readinessProbe:
httpGet:
path: /health/ready # dep checks — stops traffic but does NOT restart the pod
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2 # removes from rotation after 2 consecutive dep failures
startupProbe: # optional: give the app time to boot before probing
httpGet:
path: /health/live
port: 3000
failureThreshold: 30
periodSeconds: 5
Warning: Do not point
livenessProbeat/health/ready. A single dependency failure will cause Kubernetes to restart every pod simultaneously, turning a recoverable dep outage into a full application restart loop.
Load balancer wiring
Always use the liveness endpoint for load balancer health checks. If you use the readiness endpoint and a dependency blips, the load balancer ejects all nodes at once and traffic has nowhere to go.
AWS ALB / NLB (target group health check)
Health check path: /health/live
Healthy threshold: 2
Unhealthy threshold: 3
Timeout: 5s
Interval: 10s
Nginx upstream
upstream rails_app {
server app1:3000;
server app2:3000;
}
server {
location /health/live {
proxy_pass http://rails_app;
}
}
HAProxy
backend rails_app
option httpchk GET /health/live
server app1 app1:3000 check
server app2 app2:3000 check
Note: Reserve
/health/readyfor KubernetesreadinessProbeand external uptime monitors (Pingdom, UptimeRobot, Better Uptime). These are the right tools to alert you when dependencies are down — the load balancer is not.
Configuring endpoint paths
The readiness path defaults to ready (i.e. /health/ready when the engine is mounted at /health). Override it in your initializer:
RailsHealthChecks.configure do |config|
config.readiness_path = "readyz" # → /health/readyz
end
The engine mount point is configurable in config/routes.rb:
mount RailsHealthChecks::Engine => "/healthz"
# exposes: /healthz/live, /healthz/ready, /healthz, /healthz/metrics
JSON response shape
{
"status": "ok",
"timestamp": "2026-06-08T20:00:00Z",
"checks": {
"database": { "status": "ok", "latency_ms": 4 },
"cache": { "status": "ok", "latency_ms": 1 }
}
}
Status values: ok | degraded | critical. Overall status is critical if any check is critical, degraded if any is degraded, ok otherwise.
Configuration
Run the initializer generator to create config/initializers/rails_health_checks.rb with every option documented as a commented example:
rails generate rails_health_checks:initializer
The generated file (shown below with all options) uses the block-style configure API. Every setting has a sensible default — uncomment only what you need:
# frozen_string_literal: true
RailsHealthChecks.configure do |config|
# Checks to run (default: [:database])
# Available built-ins: :database, :cache, :redis, :smtp, :sidekiq, :solid_queue,
# :good_job, :resque, :disk, :memory, :http
config.checks = [:database]
# Global timeout per check in seconds (default: 5)
config.timeout = 5
# Cache check results for N seconds to avoid re-running on every request (default: nil, disabled)
# config.cache_duration = 10
# ---------------------------------------------------------------------------
# Authentication — all strategies are mutually exclusive; default is public
# ---------------------------------------------------------------------------
# Bearer token: requests must include Authorization: Bearer <token>
# config.token = ENV["HEALTH_TOKEN"]
# IP allowlist: exact IPs or CIDR ranges
# config.allowed_ips = ["127.0.0.1", "10.0.0.0/8"]
# Custom block: return truthy to allow the request
# config.authenticate { |request| request.headers["X-Internal"] == "true" }
# ---------------------------------------------------------------------------
# Per-environment toggling
# ---------------------------------------------------------------------------
# config.disable :disk, in: :test
# config.disable :memory, in: [:test, :development]
# ---------------------------------------------------------------------------
# Check groups — expose subsets at GET /health/:group
# ---------------------------------------------------------------------------
# config.group :system, [:disk, :memory]
# config.group :workers, [:sidekiq, :good_job]
# ---------------------------------------------------------------------------
# Redis check (requires :redis in config.checks and the redis gem)
# ---------------------------------------------------------------------------
# config.redis_url = ENV["REDIS_URL"] # default: redis://localhost:6379/0
# ---------------------------------------------------------------------------
# SMTP check (requires :smtp in config.checks)
# Reads ActionMailer::Base.smtp_settings automatically if not set here.
# ---------------------------------------------------------------------------
# config.smtp_address = "smtp.example.com" # default: ActionMailer config or localhost
# config.smtp_port = 587 # default: ActionMailer config or 25
# ---------------------------------------------------------------------------
# Disk check (requires :disk in config.checks)
# ---------------------------------------------------------------------------
# config.disk_path = "/" # mount point (default: "/")
# config.disk_warn_threshold = 2 * 1024**3 # bytes free → degraded
# config.disk_critical_threshold = 512 * 1024**2 # bytes free → critical
# ---------------------------------------------------------------------------
# Memory check (requires :memory in config.checks)
# ---------------------------------------------------------------------------
# config.memory_threshold = 512 * 1024**2 # RSS bytes → degraded
# ---------------------------------------------------------------------------
# HTTP check (requires :http in config.checks)
# ---------------------------------------------------------------------------
# config.http_url = "https://api.example.com/status"
# config.http_expected_status = 200 # expected response code (default: 200)
# config.http_headers = { "Authorization" => "Bearer #{ENV['API_TOKEN']}" }
# ---------------------------------------------------------------------------
# Sidekiq check (requires :sidekiq in config.checks)
# ---------------------------------------------------------------------------
# config.sidekiq_queue_size = 1000 # total depth → degraded
# ---------------------------------------------------------------------------
# Solid Queue check (requires :solid_queue in config.checks)
# ---------------------------------------------------------------------------
# config.solid_queue_job_count = 500 # pending jobs → degraded
# ---------------------------------------------------------------------------
# GoodJob check (requires :good_job in config.checks)
# ---------------------------------------------------------------------------
# config.good_job_latency = 300 # seconds oldest job waiting → degraded
# ---------------------------------------------------------------------------
# Resque check (requires :resque in config.checks)
# ---------------------------------------------------------------------------
# config.resque_queue_size = 1000 # total depth → degraded
# ---------------------------------------------------------------------------
# Custom checks
# ---------------------------------------------------------------------------
# class MyApiCheck < RailsHealthChecks::Check
# def call
# res = Net::HTTP.get_response(URI("https://api.example.com/status"))
# res.code == "200" ? pass : fail_with("API returned #{res.code}")
# end
# end
#
# config.register :my_api, MyApiCheck.new
# config.register :slow_api, MyApiCheck.new, timeout: 10 # per-check timeout override
end
Configuration is validated at boot time. An unknown check name, a missing http_url for the :http check, or a group referencing an undefined check raises RailsHealthChecks::ConfigurationError on startup rather than silently failing on the first request.
Configuration Reference
| Option | Type | Default | Description |
|---|---|---|---|
checks |
Array |
[:database] |
Built-in or custom check names to run |
timeout |
Integer |
5 |
Global per-check timeout in seconds |
cache_duration |
`Integer\ | nil` | nil |
readiness_path |
String |
"ready" |
Path of the readiness endpoint within the engine (e.g. "ready" → /health/ready) |
token |
`String\ | nil` | nil |
allowed_ips |
`Array\ | nil` | nil |
redis_url |
`String\ | nil` | nil |
smtp_address |
`String\ | nil` | nil |
smtp_port |
`Integer\ | nil` | nil |
sidekiq_queue_size |
`Integer\ | nil` | nil |
solid_queue_job_count |
`Integer\ | nil` | nil |
good_job_latency |
`Integer\ | nil` | nil |
resque_queue_size |
`Integer\ | nil` | nil |
disk_path |
String |
"/" |
Mount point for :disk check |
disk_warn_threshold |
`Integer\ | nil` | nil |
disk_critical_threshold |
`Integer\ | nil` | nil |
memory_threshold |
`Integer\ | nil` | nil |
http_url |
`String\ | nil` | nil |
http_expected_status |
Integer |
200 |
Expected HTTP response code for :http check |
http_headers |
Hash |
{} |
Request headers sent by :http check |
Authentication
By default health endpoints are public. Use one of the following strategies to restrict access. Unauthenticated requests receive 401 Unauthorized.
Note:
GET /health/livealways bypasses authentication regardless of the configured strategy. Liveness probes are called by Kubernetes and load balancers which cannot pass credentials, so enforcing auth on this endpoint would break infrastructure probing.
Bearer token
RailsHealthChecks.configure do |config|
config.token = ENV["HEALTH_TOKEN"]
end
Requests must include Authorization: Bearer <token>.
IP allowlist
RailsHealthChecks.configure do |config|
config.allowed_ips = ["127.0.0.1", "10.0.0.0/8"] # exact IPs or CIDR ranges
end
Custom block
RailsHealthChecks.configure do |config|
config.authenticate { |request| request.headers["X-Internal"] == "true" }
end
The block receives the request object and must return a truthy value to allow access. In a Rails app this is ActionDispatch::Request; in the Rack app it is Rack::Request.
Built-in Checks
| Check | Requires | Description |
|---|---|---|
:database |
— | ActiveRecord SELECT 1 against the primary connection |
:cache |
— | Rails.cache read/write probe; works with any cache store |
:redis |
redis gem |
Direct Redis PING; config.redis_url or REDIS_URL env var |
:smtp |
— | SMTP connectivity via Net::SMTP; reads ActionMailer config automatically |
:sidekiq |
sidekiq gem |
Sidekiq Redis connectivity; optional config.sidekiq_queue_size depth threshold |
:solid_queue |
solid_queue gem |
SolidQueue DB connectivity; optional config.solid_queue_job_count threshold |
:good_job |
good_job gem |
GoodJob queue latency; optional config.good_job_latency (seconds) threshold |
:resque |
resque gem |
Resque Redis connectivity; optional config.resque_queue_size depth threshold |
:disk |
— | Free disk space via df; config.disk_warn_threshold / config.disk_critical_threshold (bytes) |
:memory |
— | Process RSS via ps; optional config.memory_threshold (bytes) reports degraded when exceeded |
:http |
— | HTTP GET to config.http_url; config.http_expected_status and config.http_headers |
All checks run in parallel. Each check times out independently using config.timeout (default: 5s) or a per-check override set via config.register.
Notifications
Every health check run publishes an ActiveSupport::Notifications event:
ActiveSupport::Notifications.subscribe("health_check.rails_health_checks") do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
Rails.logger.info "Health check: #{event.payload[:status]} (#{event.duration.round}ms)"
# event.payload[:checks] => { database: { status: "ok", latency_ms: 3 }, ... }
end
The payload includes:
| Key | Value |
|---|---|
status |
Overall status: "ok", "degraded", or "critical" |
checks |
Hash of { check_name => { status:, latency_ms:, message: } } |
duration on the event covers the entire parallel check run, not individual checks.
Prometheus Metrics
GET /health/metrics returns Prometheus text exposition format (text/plain; version=0.0.4). This endpoint always returns HTTP 200 per Prometheus scraping convention — check state is encoded in metric values.
# HELP rails_health_check_status Health check status (0=ok, 1=degraded, 2=critical)
# TYPE rails_health_check_status gauge
rails_health_check_status{check="database"} 0
rails_health_check_status{check="cache"} 0
# HELP rails_health_check_latency_ms Health check latency in milliseconds
# TYPE rails_health_check_latency_ms gauge
rails_health_check_latency_ms{check="database"} 4
rails_health_check_latency_ms{check="cache"} 2
Latency lines are omitted for checks that do not call measure { }.
Result Caching
By default every request re-runs all checks. Set cache_duration to serve cached results for N seconds, reducing load on the database, Redis, and other dependencies:
RailsHealthChecks.configure do |config|
config.cache_duration = 10 # seconds
end
The cache is keyed per check set — GET /health and GET /health/workers cache independently. The cache is in-process (not shared across dynos/containers), so each instance maintains its own result window.
Per-Environment Toggling
Disable specific checks in specific environments:
RailsHealthChecks.configure do |config|
config.checks = [:database, :cache, :disk, :memory]
config.disable :disk, in: :test
config.disable :memory, in: [:test, :development]
end
The check is removed from the active list only when Rails.env matches. The in: option accepts a single symbol or an array.
Check Groups
Group related checks and expose them at a dedicated endpoint:
RailsHealthChecks.configure do |config|
config.group :system, [:disk, :memory]
config.group :workers, [:sidekiq, :good_job]
end
| Endpoint | Runs |
|---|---|
GET /health/system |
:disk, :memory |
GET /health/workers |
:sidekiq, :good_job |
The response shape is identical to GET /health. Unknown group names return 404 Not Found.
Custom Checks
Authoring
Define a class inheriting from RailsHealthChecks::Check, implement call, and register it:
class PaymentGatewayCheck < RailsHealthChecks::Check
def call
measure do
response = Net::HTTP.get_response(URI("https://api.stripe.com/v1/charges"))
case response.code.to_i
when 200, 401 # 401 = auth error, but gateway is reachable
pass
when 429
warn_with("rate limited (429)")
else
fail_with("unexpected status #{response.code}")
end
end
rescue StandardError => e
fail_with(e.)
end
end
RailsHealthChecks.configure do |config|
config.register :payment_gateway, PaymentGatewayCheck.new
config.register :slow_gateway, PaymentGatewayCheck.new, timeout: 15
end
config.register appends the check to the active list automatically.
Check API
| Method | Status set | Use when |
|---|---|---|
pass(message = nil) |
ok |
Check passed; optional message |
warn_with(message) |
degraded |
Check is functional but degraded |
fail_with(message) |
critical |
Check failed; service is impaired |
measure { } |
— | Wraps a block and records latency_ms |
State contract: call exactly one of pass, warn_with, or fail_with per call invocation. The check instance is dup'd before each run, so instance variables set during one request do not bleed into the next.
Testing Custom Checks
Call the check directly in a unit test — no request stack needed:
RSpec.describe PaymentGatewayCheck do
subject(:check) { described_class.new }
context "when the gateway is reachable" do
before do
stub_request(:get, "https://api.stripe.com/v1/charges")
.to_return(status: 200)
end
it "passes" do
check.call
expect(check.status).to eq("ok")
end
end
context "when the gateway is rate-limited" do
before do
stub_request(:get, "https://api.stripe.com/v1/charges")
.to_return(status: 429)
end
it "warns" do
check.call
expect(check.status).to eq("degraded")
expect(check.).to include("rate limited")
end
end
end
Migrating from OkComputer
See MIGRATING_FROM_OKCOMPUTER.md for a full mapping of check names, configuration keys, and endpoint differences.
Quick reference:
| OkComputer | rails_health_checks |
|---|---|
OkComputer::ActiveRecordCheck |
:database |
OkComputer::CacheCheck |
:cache |
OkComputer::RedisCheck |
:redis |
OkComputer::SidekiqLatencyCheck |
:sidekiq + config.sidekiq_queue_size |
OkComputer::HttpCheck |
:http + config.http_url |
OkComputer::CustomCheck subclass |
Subclass RailsHealthChecks::Check |
GET /okcomputer |
GET /health |
GET /okcomputer/all |
GET /health |
Performance
See BENCHMARKS.md for throughput numbers, parallel execution speedup, and cache effectiveness measurements. To run the suite locally:
bundle exec rake benchmark
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
License
The gem is available as open source under the terms of the MIT License.