RailsHealthChecks

CI gem downloads ruby codecov

A Rails engine that adds production-grade health check endpoints to any Rails app. Goes well beyond the built-in /up endpoint with 11 built-in checks, parallel execution, structured JSON responses, Prometheus metrics, and a clean configuration DSL.

Built-in checks: database · cache · Redis · SMTP · Sidekiq · SolidQueue · GoodJob · Resque · disk · memory · HTTP

Key features:

  • Two-tier endpoints: /live (liveness — process only) and /ready (readiness — all deps) prevent cascade failures in Kubernetes and behind load balancers
  • Parallel check execution via Concurrent::Future — response time bounded by the slowest check, not the sum
  • Result caching (config.cache_duration) to absorb high-frequency probe traffic
  • Prometheus text exposition at GET /health/metrics (always HTTP 200)
  • Check groups (config.group) expose subsets at /health/:group
  • Per-environment toggling, boot-time validation, and bearer token / IP / custom auth
  • rails generate rails_health_checks:initializer scaffolds a fully-commented config file
  • Drop-in replacement for OkComputer — see MIGRATING_FROM_OKCOMPUTER.md

Table of Contents


Upgrading

v1.1.x → v1.2.x — breaking change to /live

GET /health/live no longer runs dependency checks.

Prior to v1.2.0, /live ran all configured checks (database, Redis, etc.) and returned 503 if any failed. This was readiness behaviour under a liveness name and is the root cause of the cascade failure footgun described below.

What changed: /live now returns 200 OK whenever the Ruby process is alive, regardless of dependency state. Authentication is also skipped on this endpoint so Kubernetes and load balancer probes work without credentials.

What to do: If you were relying on /live to verify dependencies, switch to the new /health/ready endpoint. No configuration changes required.

# Before (was running dependency checks — now only liveness)
GET /health/live   →  200 if process alive (deps ignored)

# New endpoint for dependency checks
GET /health/ready  →  200 if all deps pass, 503 if any fail

↑ Back to top


Installation

Add to your Gemfile:

gem "rails_health_checks"

Then run:

bundle install

Mount the engine in config/routes.rb:

mount RailsHealthChecks::Engine => "/health"

↑ Back to top


Rack Applications

RailsHealthChecks::Rack::App is a mountable Rack app that exposes the same endpoints without requiring ActionDispatch or Rails routing. It is opt-in — the Rails engine is unaffected.

Setup

Add to your Gemfile (the gem already lists rails >= 8.0 as a dependency, so activesupport and concurrent-ruby are available):

gem "rails_health_checks"

Require and mount the Rack app alongside your existing app:

# config.ru
require "rails_health_checks"
require "rails_health_checks/rack/app"

RailsHealthChecks.configure do |config|
  config.checks = [:disk, :memory, :redis]
  config.redis_url = ENV["REDIS_URL"]
end

map "/health" do
  run RailsHealthChecks::Rack::App
end

run MyApp

Sinatra

require "rails_health_checks/rack/app"

class MyApp < Sinatra::Base
  use Rack::URLMap, "/health" => RailsHealthChecks::Rack::App
end

Roda

require "rails_health_checks/rack/app"

class MyApp < Roda
  plugin :multi_run
  run "/health", RailsHealthChecks::Rack::App
end

Available endpoints

The routes are identical to the Rails engine, relative to the mount point:

Endpoint Format Use case
GET/HEAD / JSON Full dependency health (monitoring dashboards)
GET/HEAD /live Plain text Liveness probe — process only, no deps
GET/HEAD /ready Plain text Readiness probe — all configured dependency checks
GET /metrics Prometheus text Prometheus scraping
GET /:group JSON Scoped check group

Framework-agnostic vs Rails-coupled checks

Checks that depend on Rails internals require those libraries to be present in the stack. Checks that use only stdlib or standalone gems work in any Rack context:

Check Works without Rails?
:disk Yes
:memory Yes
:http Yes
:redis Yes (requires redis gem)
:smtp Yes (reads ActionMailer config if available, otherwise requires config.smtp_address)
:database Requires ActiveRecord
:cache Requires Rails.cache
:sidekiq Requires Sidekiq
:solid_queue Requires SolidQueue
:good_job Requires GoodJob
:resque Requires Resque

Per-environment toggling in Rack

config.disable :check, in: :env compares against Rails.env in a Rails app. In a non-Rails Rack app it reads ENV["RACK_ENV"] instead (defaulting to "production" if unset):

config.disable :disk, in: :test   # compares RACK_ENV when Rails is not defined

Authentication in Rack

All three authentication strategies work identically. When using the custom block strategy, the argument is a Rack::Request instead of ActionDispatch::Request:

RailsHealthChecks.configure do |config|
  config.authenticate { |request| request.env["HTTP_X_INTERNAL"] == "true" }
end

Token and IP allowlist strategies are unchanged.

↑ Back to top


Endpoints

Endpoint Runs checks? Format Use case
GET /health/live No — process only Plain text Kubernetes livenessProbe, load balancer health check
GET /health/ready Yes — all configured deps Plain text Kubernetes readinessProbe, external uptime monitors
GET /health Yes — all configured deps JSON Monitoring dashboards, alerting pipelines
GET /health/metrics Yes — all configured deps Prometheus text Prometheus / OpenMetrics scraping
GET /health/:group Yes — named subset JSON Scoped group (e.g. /health/workers)

/health/live, /health/ready, and /health also respond to HEAD requests.

HTTP status: 200 OK when all checks pass, 503 Service Unavailable when any check fails (except /metrics which always returns 200, and /live which always returns 200).


Liveness vs. Readiness — why two tiers?

Using a single health endpoint for both load balancer checks and dependency monitoring is a cascade failure footgun. Here is the exact failure chain:

  1. Your database has a 30-second blip
  2. All running pods probe /health/ready → all return 503
  3. The load balancer removes every pod from rotation simultaneously
  4. Traffic has nowhere to go — the app is fully down
  5. If the same endpoint drives livenessProbe, Kubernetes begins restarting every pod
  6. Restarting pods reconnect to the still-blipping database, fail again, restart again
  7. What was a 30-second DB hiccup is now a multi-minute outage driven by a thundering herd of pod restarts

The fix is to separate the two concerns:

Endpoint Question it answers Correct probe
/health/live Is the process running and responsive? livenessProbe, LB health check
/health/ready Are all dependencies reachable? readinessProbe, uptime monitor

Liveness (/health/live) — returns 200 OK as long as the Ruby process responds. No dependency checks run. Authentication is skipped so Kubernetes and load balancers work without credentials. When this fails, k8s restarts the pod because the process itself is stuck or crashed.

Readiness (/health/ready) — runs all configured dependency checks. Returns 503 if any check fails. When this fails, k8s stops routing traffic to the pod but leaves it running. The pod rejoins rotation automatically once dependencies recover — no restart, no thundering herd.

Deep JSON (/health) — same dependency checks as /ready, returned as structured JSON with per-check status and latency. Use for monitoring dashboards, alerting, or anywhere you need machine-readable detail. Do not use for liveness or readiness probes.


Kubernetes wiring

containers:
  - name: web
    ports:
      - containerPort: 3000
    livenessProbe:
      httpGet:
        path: /health/live   # process-only — DB blip does NOT restart this pod
        port: 3000
      initialDelaySeconds: 10
      periodSeconds: 10
      failureThreshold: 3    # restarts only if the process stops responding entirely
    readinessProbe:
      httpGet:
        path: /health/ready  # dep checks — stops traffic but does NOT restart the pod
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 10
      failureThreshold: 2    # removes from rotation after 2 consecutive dep failures
    startupProbe:            # optional: give the app time to boot before probing
      httpGet:
        path: /health/live
        port: 3000
      failureThreshold: 30
      periodSeconds: 5

Warning: Do not point livenessProbe at /health/ready. A single dependency failure will cause Kubernetes to restart every pod simultaneously, turning a recoverable dep outage into a full application restart loop.


Load balancer wiring

Always use the liveness endpoint for load balancer health checks. If you use the readiness endpoint and a dependency blips, the load balancer ejects all nodes at once and traffic has nowhere to go.

AWS ALB / NLB (target group health check)

Health check path:    /health/live
Healthy threshold:    2
Unhealthy threshold:  3
Timeout:              5s
Interval:             10s

Nginx upstream

upstream rails_app {
  server app1:3000;
  server app2:3000;
}

server {
  location /health/live {
    proxy_pass http://rails_app;
  }
}

HAProxy

backend rails_app
  option httpchk GET /health/live
  server app1 app1:3000 check
  server app2 app2:3000 check

Note: Reserve /health/ready for Kubernetes readinessProbe and external uptime monitors (Pingdom, UptimeRobot, Better Uptime). These are the right tools to alert you when dependencies are down — the load balancer is not.


Configuring endpoint paths

The readiness path defaults to ready (i.e. /health/ready when the engine is mounted at /health). Override it in your initializer:

RailsHealthChecks.configure do |config|
  config.readiness_path = "readyz"  # → /health/readyz
end

The engine mount point is configurable in config/routes.rb:

mount RailsHealthChecks::Engine => "/healthz"
# exposes: /healthz/live, /healthz/ready, /healthz, /healthz/metrics

JSON response shape

{
  "status": "ok",
  "timestamp": "2026-06-08T20:00:00Z",
  "checks": {
    "database": { "status": "ok", "latency_ms": 4 },
    "cache":    { "status": "ok", "latency_ms": 1 }
  }
}

Status values: ok | degraded | critical. Overall status is critical if any check is critical, degraded if any is degraded, ok otherwise.

↑ Back to top


Configuration

Run the initializer generator to create config/initializers/rails_health_checks.rb with every option documented as a commented example:

rails generate rails_health_checks:initializer

The generated file (shown below with all options) uses the block-style configure API. Every setting has a sensible default — uncomment only what you need:

# frozen_string_literal: true

RailsHealthChecks.configure do |config|
  # Checks to run (default: [:database])
  # Available built-ins: :database, :cache, :redis, :smtp, :sidekiq, :solid_queue,
  #                      :good_job, :resque, :disk, :memory, :http
  config.checks = [:database]

  # Global timeout per check in seconds (default: 5)
  config.timeout = 5

  # Cache check results for N seconds to avoid re-running on every request (default: nil, disabled)
  # config.cache_duration = 10

  # ---------------------------------------------------------------------------
  # Authentication — all strategies are mutually exclusive; default is public
  # ---------------------------------------------------------------------------

  # Bearer token: requests must include Authorization: Bearer <token>
  # config.token = ENV["HEALTH_TOKEN"]

  # IP allowlist: exact IPs or CIDR ranges
  # config.allowed_ips = ["127.0.0.1", "10.0.0.0/8"]

  # Custom block: return truthy to allow the request
  # config.authenticate { |request| request.headers["X-Internal"] == "true" }

  # ---------------------------------------------------------------------------
  # Per-environment toggling
  # ---------------------------------------------------------------------------
  # config.disable :disk,   in: :test
  # config.disable :memory, in: [:test, :development]

  # ---------------------------------------------------------------------------
  # Check groups — expose subsets at GET /health/:group
  # ---------------------------------------------------------------------------
  # config.group :system,  [:disk, :memory]
  # config.group :workers, [:sidekiq, :good_job]

  # ---------------------------------------------------------------------------
  # Redis check (requires :redis in config.checks and the redis gem)
  # ---------------------------------------------------------------------------
  # config.redis_url = ENV["REDIS_URL"]         # default: redis://localhost:6379/0

  # ---------------------------------------------------------------------------
  # SMTP check (requires :smtp in config.checks)
  # Reads ActionMailer::Base.smtp_settings automatically if not set here.
  # ---------------------------------------------------------------------------
  # config.smtp_address = "smtp.example.com"  # default: ActionMailer config or localhost
  # config.smtp_port    = 587                 # default: ActionMailer config or 25

  # ---------------------------------------------------------------------------
  # Disk check (requires :disk in config.checks)
  # ---------------------------------------------------------------------------
  # config.disk_path               = "/"             # mount point (default: "/")
  # config.disk_warn_threshold     = 2 * 1024**3     # bytes free → degraded
  # config.disk_critical_threshold = 512 * 1024**2   # bytes free → critical

  # ---------------------------------------------------------------------------
  # Memory check (requires :memory in config.checks)
  # ---------------------------------------------------------------------------
  # config.memory_threshold = 512 * 1024**2          # RSS bytes → degraded

  # ---------------------------------------------------------------------------
  # HTTP check (requires :http in config.checks)
  # ---------------------------------------------------------------------------
  # config.http_url             = "https://api.example.com/status"
  # config.http_expected_status = 200                # expected response code (default: 200)
  # config.http_headers         = { "Authorization" => "Bearer #{ENV['API_TOKEN']}" }

  # ---------------------------------------------------------------------------
  # Sidekiq check (requires :sidekiq in config.checks)
  # ---------------------------------------------------------------------------
  # config.sidekiq_queue_size = 1000                 # total depth → degraded

  # ---------------------------------------------------------------------------
  # Solid Queue check (requires :solid_queue in config.checks)
  # ---------------------------------------------------------------------------
  # config.solid_queue_job_count = 500               # pending jobs → degraded

  # ---------------------------------------------------------------------------
  # GoodJob check (requires :good_job in config.checks)
  # ---------------------------------------------------------------------------
  # config.good_job_latency = 300                    # seconds oldest job waiting → degraded

  # ---------------------------------------------------------------------------
  # Resque check (requires :resque in config.checks)
  # ---------------------------------------------------------------------------
  # config.resque_queue_size = 1000                  # total depth → degraded

  # ---------------------------------------------------------------------------
  # Custom checks
  # ---------------------------------------------------------------------------
  # class MyApiCheck < RailsHealthChecks::Check
  #   def call
  #     res = Net::HTTP.get_response(URI("https://api.example.com/status"))
  #     res.code == "200" ? pass : fail_with("API returned #{res.code}")
  #   end
  # end
  #
  # config.register :my_api, MyApiCheck.new
  # config.register :slow_api, MyApiCheck.new, timeout: 10  # per-check timeout override
end

Configuration is validated at boot time. An unknown check name, a missing http_url for the :http check, or a group referencing an undefined check raises RailsHealthChecks::ConfigurationError on startup rather than silently failing on the first request.

Configuration Reference

Option Type Default Description
checks Array [:database] Built-in or custom check names to run
timeout Integer 5 Global per-check timeout in seconds
cache_duration `Integer\ nil` nil
readiness_path String "ready" Path of the readiness endpoint within the engine (e.g. "ready"/health/ready)
token `String\ nil` nil
allowed_ips `Array\ nil` nil
redis_url `String\ nil` nil
smtp_address `String\ nil` nil
smtp_port `Integer\ nil` nil
sidekiq_queue_size `Integer\ nil` nil
solid_queue_job_count `Integer\ nil` nil
good_job_latency `Integer\ nil` nil
resque_queue_size `Integer\ nil` nil
disk_path String "/" Mount point for :disk check
disk_warn_threshold `Integer\ nil` nil
disk_critical_threshold `Integer\ nil` nil
memory_threshold `Integer\ nil` nil
http_url `String\ nil` nil
http_expected_status Integer 200 Expected HTTP response code for :http check
http_headers Hash {} Request headers sent by :http check

↑ Back to top


Authentication

By default health endpoints are public. Use one of the following strategies to restrict access. Unauthenticated requests receive 401 Unauthorized.

Note: GET /health/live always bypasses authentication regardless of the configured strategy. Liveness probes are called by Kubernetes and load balancers which cannot pass credentials, so enforcing auth on this endpoint would break infrastructure probing.

Bearer token

RailsHealthChecks.configure do |config|
  config.token = ENV["HEALTH_TOKEN"]
end

Requests must include Authorization: Bearer <token>.

IP allowlist

RailsHealthChecks.configure do |config|
  config.allowed_ips = ["127.0.0.1", "10.0.0.0/8"]  # exact IPs or CIDR ranges
end

Custom block

RailsHealthChecks.configure do |config|
  config.authenticate { |request| request.headers["X-Internal"] == "true" }
end

The block receives the request object and must return a truthy value to allow access. In a Rails app this is ActionDispatch::Request; in the Rack app it is Rack::Request.

↑ Back to top


Built-in Checks

Check Requires Description
:database ActiveRecord SELECT 1 against the primary connection
:cache Rails.cache read/write probe; works with any cache store
:redis redis gem Direct Redis PING; config.redis_url or REDIS_URL env var
:smtp SMTP connectivity via Net::SMTP; reads ActionMailer config automatically
:sidekiq sidekiq gem Sidekiq Redis connectivity; optional config.sidekiq_queue_size depth threshold
:solid_queue solid_queue gem SolidQueue DB connectivity; optional config.solid_queue_job_count threshold
:good_job good_job gem GoodJob queue latency; optional config.good_job_latency (seconds) threshold
:resque resque gem Resque Redis connectivity; optional config.resque_queue_size depth threshold
:disk Free disk space via df; config.disk_warn_threshold / config.disk_critical_threshold (bytes)
:memory Process RSS via ps; optional config.memory_threshold (bytes) reports degraded when exceeded
:http HTTP GET to config.http_url; config.http_expected_status and config.http_headers

All checks run in parallel. Each check times out independently using config.timeout (default: 5s) or a per-check override set via config.register.

↑ Back to top


Notifications

Every health check run publishes an ActiveSupport::Notifications event:

ActiveSupport::Notifications.subscribe("health_check.rails_health_checks") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  Rails.logger.info "Health check: #{event.payload[:status]} (#{event.duration.round}ms)"
  # event.payload[:checks] => { database: { status: "ok", latency_ms: 3 }, ... }
end

The payload includes:

Key Value
status Overall status: "ok", "degraded", or "critical"
checks Hash of { check_name => { status:, latency_ms:, message: } }

duration on the event covers the entire parallel check run, not individual checks.

↑ Back to top


Prometheus Metrics

GET /health/metrics returns Prometheus text exposition format (text/plain; version=0.0.4). This endpoint always returns HTTP 200 per Prometheus scraping convention — check state is encoded in metric values.

# HELP rails_health_check_status Health check status (0=ok, 1=degraded, 2=critical)
# TYPE rails_health_check_status gauge
rails_health_check_status{check="database"} 0
rails_health_check_status{check="cache"} 0

# HELP rails_health_check_latency_ms Health check latency in milliseconds
# TYPE rails_health_check_latency_ms gauge
rails_health_check_latency_ms{check="database"} 4
rails_health_check_latency_ms{check="cache"} 2

Latency lines are omitted for checks that do not call measure { }.

↑ Back to top


Result Caching

By default every request re-runs all checks. Set cache_duration to serve cached results for N seconds, reducing load on the database, Redis, and other dependencies:

RailsHealthChecks.configure do |config|
  config.cache_duration = 10  # seconds
end

The cache is keyed per check set — GET /health and GET /health/workers cache independently. The cache is in-process (not shared across dynos/containers), so each instance maintains its own result window.

↑ Back to top


Per-Environment Toggling

Disable specific checks in specific environments:

RailsHealthChecks.configure do |config|
  config.checks = [:database, :cache, :disk, :memory]
  config.disable :disk,   in: :test
  config.disable :memory, in: [:test, :development]
end

The check is removed from the active list only when Rails.env matches. The in: option accepts a single symbol or an array.

↑ Back to top


Check Groups

Group related checks and expose them at a dedicated endpoint:

RailsHealthChecks.configure do |config|
  config.group :system,  [:disk, :memory]
  config.group :workers, [:sidekiq, :good_job]
end
Endpoint Runs
GET /health/system :disk, :memory
GET /health/workers :sidekiq, :good_job

The response shape is identical to GET /health. Unknown group names return 404 Not Found.

↑ Back to top


Custom Checks

Authoring

Define a class inheriting from RailsHealthChecks::Check, implement call, and register it:

class PaymentGatewayCheck < RailsHealthChecks::Check
  def call
    measure do
      response = Net::HTTP.get_response(URI("https://api.stripe.com/v1/charges"))
      case response.code.to_i
      when 200, 401  # 401 = auth error, but gateway is reachable
        pass
      when 429
        warn_with("rate limited (429)")
      else
        fail_with("unexpected status #{response.code}")
      end
    end
  rescue StandardError => e
    fail_with(e.message)
  end
end

RailsHealthChecks.configure do |config|
  config.register :payment_gateway, PaymentGatewayCheck.new
  config.register :slow_gateway,    PaymentGatewayCheck.new, timeout: 15
end

config.register appends the check to the active list automatically.

Check API

Method Status set Use when
pass(message = nil) ok Check passed; optional message
warn_with(message) degraded Check is functional but degraded
fail_with(message) critical Check failed; service is impaired
measure { } Wraps a block and records latency_ms

State contract: call exactly one of pass, warn_with, or fail_with per call invocation. The check instance is dup'd before each run, so instance variables set during one request do not bleed into the next.

Testing Custom Checks

Call the check directly in a unit test — no request stack needed:

RSpec.describe PaymentGatewayCheck do
  subject(:check) { described_class.new }

  context "when the gateway is reachable" do
    before do
      stub_request(:get, "https://api.stripe.com/v1/charges")
        .to_return(status: 200)
    end

    it "passes" do
      check.call
      expect(check.status).to eq("ok")
    end
  end

  context "when the gateway is rate-limited" do
    before do
      stub_request(:get, "https://api.stripe.com/v1/charges")
        .to_return(status: 429)
    end

    it "warns" do
      check.call
      expect(check.status).to eq("degraded")
      expect(check.message).to include("rate limited")
    end
  end
end

↑ Back to top


Migrating from OkComputer

See MIGRATING_FROM_OKCOMPUTER.md for a full mapping of check names, configuration keys, and endpoint differences.

Quick reference:

OkComputer rails_health_checks
OkComputer::ActiveRecordCheck :database
OkComputer::CacheCheck :cache
OkComputer::RedisCheck :redis
OkComputer::SidekiqLatencyCheck :sidekiq + config.sidekiq_queue_size
OkComputer::HttpCheck :http + config.http_url
OkComputer::CustomCheck subclass Subclass RailsHealthChecks::Check
GET /okcomputer GET /health
GET /okcomputer/all GET /health

↑ Back to top


Performance

See BENCHMARKS.md for throughput numbers, parallel execution speedup, and cache effectiveness measurements. To run the suite locally:

bundle exec rake benchmark

↑ Back to top


Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

↑ Back to top


License

The gem is available as open source under the terms of the MIT License.