RailsHealthChecks

A Rails engine that adds production-grade health check endpoints to any Rails app. Goes well beyond the built-in /up endpoint with 11 built-in checks, parallel execution, structured JSON responses, Prometheus metrics, and a clean configuration DSL.

Built-in checks: database · cache · Redis · SMTP · Sidekiq · SolidQueue · GoodJob · Resque · disk · memory · HTTP

Key features:

Two-tier endpoints: /live (liveness — process only) and /ready (readiness — all deps) prevent cascade failures in Kubernetes and behind load balancers
Parallel check execution via Concurrent::Future — response time bounded by the slowest check, not the sum
Result caching (config.cache_duration) to absorb high-frequency probe traffic
Prometheus text exposition at GET /health/metrics (always HTTP 200)
Check groups (config.group) expose subsets at /health/:group
Per-environment toggling, boot-time validation, and bearer token / IP / custom auth
rails generate rails_health_checks:initializer scaffolds a fully-commented config file
Drop-in replacement for OkComputer — see MIGRATING_FROM_OKCOMPUTER.md

Upgrading
Installation
Rack Applications
Endpoints
Configuration
- Configuration Reference
Authentication
Built-in Checks
Notifications
Prometheus Metrics
Result Caching
Per-Environment Toggling
Check Groups
Custom Checks
- Check API
- Testing Custom Checks
Migrating from OkComputer
Performance
Contributing
License

Upgrading

v1.1.x → v1.2.x — breaking change to `/live`

GET /health/live no longer runs dependency checks.

Prior to v1.2.0, /live ran all configured checks (database, Redis, etc.) and returned 503 if any failed. This was readiness behaviour under a liveness name and is the root cause of the cascade failure footgun described below.

What changed: /live now returns 200 OK whenever the Ruby process is alive, regardless of dependency state. Authentication is also skipped on this endpoint so Kubernetes and load balancer probes work without credentials.

What to do: If you were relying on /live to verify dependencies, switch to the new /health/ready endpoint. No configuration changes required.

# Before (was running dependency checks — now only liveness)
GET /health/live   →  200 if process alive (deps ignored)

# New endpoint for dependency checks
GET /health/ready  →  200 if all deps pass, 503 if any fail

Endpoint	Format	Use case
`GET/HEAD /`	JSON	Full dependency health (monitoring dashboards)
`GET/HEAD /live`	Plain text	Liveness probe — process only, no deps
`GET/HEAD /ready`	Plain text	Readiness probe — all configured dependency checks
`GET /metrics`	Prometheus text	Prometheus scraping
`GET /:group`	JSON	Scoped check group

Check	Works without Rails?
`:disk`	Yes
`:memory`	Yes
`:http`	Yes
`:redis`	Yes (requires `redis` gem)
`:smtp`	Yes (reads `ActionMailer` config if available, otherwise requires `config.smtp_address`)
`:database`	Requires ActiveRecord
`:cache`	Requires `Rails.cache`
`:sidekiq`	Requires Sidekiq
`:solid_queue`	Requires SolidQueue
`:good_job`	Requires GoodJob
`:resque`	Requires Resque

Endpoint	Runs checks?	Format	Use case
`GET /health/live`	No — process only	Plain text	Kubernetes `livenessProbe`, load balancer health check
`GET /health/ready`	Yes — all configured deps	Plain text	Kubernetes `readinessProbe`, external uptime monitors
`GET /health`	Yes — all configured deps	JSON	Monitoring dashboards, alerting pipelines
`GET /health/metrics`	Yes — all configured deps	Prometheus text	Prometheus / OpenMetrics scraping
`GET /health/:group`	Yes — named subset	JSON	Scoped group (e.g. `/health/workers`)

Endpoint	Question it answers	Correct probe
`/health/live`	Is the process running and responsive?	`livenessProbe`, LB health check
`/health/ready`	Are all dependencies reachable?	`readinessProbe`, uptime monitor

Option	Type	Default	Description
`checks`	`Array`	`[:database]`	Built-in or custom check names to run
`timeout`	`Integer`	`5`	Global per-check timeout in seconds
`cache_duration`	`Integer\	nil`	`nil`
`readiness_path`	`String`	`"ready"`	Path of the readiness endpoint within the engine (e.g. `"ready"` → `/health/ready`)
`token`	`String\	nil`	`nil`
`allowed_ips`	`Array\	nil`	`nil`
`redis_url`	`String\	nil`	`nil`
`smtp_address`	`String\	nil`	`nil`
`smtp_port`	`Integer\	nil`	`nil`
`sidekiq_queue_size`	`Integer\	nil`	`nil`
`solid_queue_job_count`	`Integer\	nil`	`nil`
`good_job_latency`	`Integer\	nil`	`nil`
`resque_queue_size`	`Integer\	nil`	`nil`
`disk_path`	`String`	`"/"`	Mount point for `:disk` check
`disk_warn_threshold`	`Integer\	nil`	`nil`
`disk_critical_threshold`	`Integer\	nil`	`nil`
`memory_threshold`	`Integer\	nil`	`nil`
`http_url`	`String\	nil`	`nil`
`http_expected_status`	`Integer`	`200`	Expected HTTP response code for `:http` check
`http_headers`	`Hash`	`{}`	Request headers sent by `:http` check

Check	Requires	Description
`:database`	—	ActiveRecord `SELECT 1` against the primary connection
`:cache`	—	`Rails.cache` read/write probe; works with any cache store
`:redis`	`redis` gem	Direct Redis `PING`; `config.redis_url` or `REDIS_URL` env var
`:smtp`	—	SMTP connectivity via `Net::SMTP`; reads `ActionMailer` config automatically
`:sidekiq`	`sidekiq` gem	Sidekiq Redis connectivity; optional `config.sidekiq_queue_size` depth threshold
`:solid_queue`	`solid_queue` gem	SolidQueue DB connectivity; optional `config.solid_queue_job_count` threshold
`:good_job`	`good_job` gem	GoodJob queue latency; optional `config.good_job_latency` (seconds) threshold
`:resque`	`resque` gem	Resque Redis connectivity; optional `config.resque_queue_size` depth threshold
`:disk`	—	Free disk space via `df`; `config.disk_warn_threshold` / `config.disk_critical_threshold` (bytes)
`:memory`	—	Process RSS via `ps`; optional `config.memory_threshold` (bytes) reports `degraded` when exceeded
`:http`	—	HTTP GET to `config.http_url`; `config.http_expected_status` and `config.http_headers`

Key	Value
`status`	Overall status: `"ok"`, `"degraded"`, or `"critical"`
`checks`	Hash of `{ check_name => { status:, latency_ms:, message: } }`

Endpoint	Runs
`GET /health/system`	`:disk`, `:memory`
`GET /health/workers`	`:sidekiq`, `:good_job`

Method	Status set	Use when
`pass(message = nil)`	`ok`	Check passed; optional message
`warn_with(message)`	`degraded`	Check is functional but degraded
`fail_with(message)`	`critical`	Check failed; service is impaired
`measure { }`	—	Wraps a block and records `latency_ms`

OkComputer	rails_health_checks
`OkComputer::ActiveRecordCheck`	`:database`
`OkComputer::CacheCheck`	`:cache`
`OkComputer::RedisCheck`	`:redis`
`OkComputer::SidekiqLatencyCheck`	`:sidekiq` + `config.sidekiq_queue_size`
`OkComputer::HttpCheck`	`:http` + `config.http_url`
`OkComputer::CustomCheck` subclass	Subclass `RailsHealthChecks::Check`
`GET /okcomputer`	`GET /health`
`GET /okcomputer/all`	`GET /health`

RailsHealthChecks

Table of Contents

Upgrading

v1.1.x → v1.2.x — breaking change to /live

Installation

Rack Applications

Setup

Sinatra

Roda

Available endpoints

Framework-agnostic vs Rails-coupled checks

Per-environment toggling in Rack

Authentication in Rack

Endpoints

Liveness vs. Readiness — why two tiers?

Kubernetes wiring

Load balancer wiring

Configuring endpoint paths

JSON response shape

Configuration

Configuration Reference

Authentication

Bearer token

IP allowlist

Custom block

Built-in Checks

Notifications

Prometheus Metrics

Result Caching

Per-Environment Toggling

Check Groups

Custom Checks

Authoring

Check API

Testing Custom Checks

Migrating from OkComputer

Performance

Contributing

License

v1.1.x → v1.2.x — breaking change to `/live`