Hyperion

High-performance Ruby HTTP server. Rack 3 + HTTP/2 + WebSockets + gRPC on a single binary.

Hyperion serves a hello-world Rack response at 134,084 r/s with a 1.14 ms p99 on a single worker (Linux 6.x, io_uring accept loop, Server.handle_static), 7× Agoo's 19,024 r/s on the same hardware. Beyond the C-side fast path it's a complete Rack 3 server: HTTP/1.1 + HTTP/2 with ALPN, WebSockets (RFC 6455), gRPC unary + streaming on the Rack 3 trailers contract, native fiber concurrency for PG-bound apps, and pre-fork cluster mode with SO_REUSEPORT-balanced workers.

gem install hyperion-rb
bundle exec hyperion config.ru                 # http://127.0.0.1:9292

Headline benchmarks

Linux 6.8 / 16-vCPU Ubuntu 24.04 / Ruby 3.3.3, single worker, wrk -t4 -c100 -d20s unless noted. Reproduction commands and the full 6-row 4-way matrix (Hyperion / Puma / Falcon / Agoo) live in docs/BENCH_HYPERION_2_11.md.

Workload	Hyperion r/s	Hyperion p99	Reference
Static hello, `handle_static` + io_uring (2.12-D)	134,084	1.14 ms	Agoo 2.15.14: 19,024
Static hello, `handle_static` + accept4 fallback	15,685	107 µs	Agoo 2.15.14: 19,024
Dynamic block, `Server.handle { \	env\	... }` (2.14-A)	9,422
CPU JSON via block (`bench/work.ru`, 2.14-A)	5,897	256 µs	Falcon: 4,226
Generic Rack hello (no `Server.handle`)	4,752	2.02 ms	Agoo 2.15.14: 19,024
gRPC unary, h2/TLS, ghz `-c50` (2.14-D)	1,618	33.3 ms	Falcon `async-grpc`: 1,512 (+7%)

The 134,084 r/s row is sustained over a 4-hour soak at 120,684 r/s with RSS variance 2.71% and wrk-truth p99 1.14 ms (2.14-C). The io_uring loop is opt-in via HYPERION_IO_URING_ACCEPT=1 until 2.15; the accept4 row is the default on Linux.

Quick start

bundle exec hyperion config.ru                          # single process
bundle exec hyperion -w 4 -t 10 config.ru               # 4 workers × 10 threads
bundle exec hyperion -w 0 config.ru                     # one worker per CPU
bundle exec hyperion --tls-cert cert.pem --tls-key key.pem -p 9443 config.ru

bundle exec rake spec (and the default task) auto-invoke compile, so a fresh checkout just needs bundle install && bundle exec rake for a green run.

Migrating from Puma? hyperion -t N -w M matching your current Puma -t N:N -w M is the recommended drop-in. See docs/MIGRATING_FROM_PUMA.md.

Features

HTTP/1.1 + HTTP/2 + TLS

ALPN auto-negotiates h2 or http/1.1 per connection. HTTP/2 multiplexes streams onto fibers within a single connection — slow handlers don't head-of-line-block other streams. Cluster-mode TLS works (-w N + --tls-cert / --tls-key).

Smuggling defenses for HTTP/1.1: Content-Length + Transfer-Encoding together → 400; non-chunked Transfer-Encoding → 501; CRLF in response header values → ArgumentError (response-splitting guard).

WebSockets (2.1.0+)

RFC 6455 over Rack 3 full hijack, native frame codec, per-connection wrapper with auto-pong, close handshake, UTF-8 validation, and per-message size cap. ActionCable + faye-websocket on a single binary — one hyperion -w 4 -t 10 config.ru serves HTTP, HTTP/2, TLS, and /cable from the same listener. Conformance: 463/463 autobahn-testsuite cases pass. See docs/WEBSOCKETS.md.

gRPC (2.12-F+)

Hyperion's HTTP/2 path supports gRPC unary, server-streaming, client-streaming, and bidirectional RPCs via the Rack 3 trailers contract: any response body that defines #trailers gets a final HEADERS frame (with END_STREAM=1) carrying the trailer map after the DATA frames. Plain HTTP/2 traffic without the gRPC content-type keeps the unary buffered semantics — no behaviour change for non-gRPC clients.

A minimal unary handler:

class GrpcBody
  def initialize(reply); @reply = reply; end
  def each; yield @reply; end
  def trailers; { 'grpc-status' => '0', 'grpc-message' => 'OK' }; end
  def close; end
end

run ->(env) {
  request = env['rack.input'].read
  reply   = handle(request)
  [200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply)]
}

Server-streaming yields one DATA frame per each; client-streaming reads incoming frames off env['rack.input'] (a streaming IO that blocks until the next DATA frame lands); bidirectional interleaves both. Reproducible bench at bench/grpc_stream.{proto,ru} + bench/grpc_stream_bench.sh (ghz). Numbers in docs/BENCH_HYPERION_2_11.md.

`Server.handle` direct routes

Bypass the Rack adapter for hot paths:

Hyperion::Server.handle_static '/health', body: 'ok'
Hyperion::Server.handle(:GET, '/v1/ping') { |env| [200, {}, ['pong']] }

handle_static bakes the response at boot and serves from the C accept loop (134k r/s with io_uring, 16k r/s on accept4). The dynamic block form (2.14-A) runs app.call(env) on the C accept loop too — accept + recv + parse + write release the GVL while the block holds it, so multi-threaded workers actually parallelise.

Pre-fork cluster

Per-OS worker model: SO_REUSEPORT on Linux (kernel-balanced accept, 1.004–1.011 max/min ratio across workers under steady load — 2.12-E audit), master-bind + worker-fd-share on macOS/BSD where Darwin's SO_REUSEPORT doesn't load-balance. Lifecycle hooks (before_fork, on_worker_boot, on_worker_shutdown) for AR / Redis / pool init.

Async I/O (PG-bound apps)

--async-io runs plain HTTP/1.1 connections under Async::Scheduler, turning one OS thread into thousands of in-flight handler invocations. Paired with hyperion-async-pg on a pg_sleep(50ms) workload, single-worker pool=200 hits 2,381 r/s vs Puma -t 5 at 56 r/s (architectural ceiling: pool size, not thread count). Three things must all be true: --async-io, hyperion-async-pg loaded, and a fiber-aware pool (Hyperion::AsyncPg::FiberPool, async-pool, or Async::Semaphore — not the connection_pool gem, whose Mutex blocks the OS thread). Skip any one and you get parity with Puma.

Observability

/-/metrics Prometheus endpoint (admin-token guarded), per-route latency histograms, per-conn fairness rejections, WebSocket permessage-deflate ratio, kTLS active connections, ThreadPool queue depth, dispatch-mode counters (Rack / handle_static / dynamic block / h2 / async-io). Pre-built Grafana dashboard at docs/grafana/hyperion-2.4-dashboard.json. Full reference: docs/OBSERVABILITY.md.

Default-ON structured access logs (one JSON or text line per request) with hot-path optimisations: per-thread cached iso8601 timestamp, hand-rolled line builder, lock-free per-thread 4 KiB write buffer. 12-factor logger split: info/debug → stdout, warn/error/fatal → stderr.

Optional io_uring accept loop

Linux 5.x+, opt-in via HYPERION_IO_URING_ACCEPT=1. Multishot accept

per-conn RECV/WRITE/CLOSE state machine on top of liburing. One io_uring_enter per N requests instead of N×3 syscalls. Compiles out cleanly without liburing — the accept4 path stays the fallback. macOS keeps using accept4. Default-flip moves to 2.15 with a fresh 24h soak.

Configuration

Three layers, in precedence order: explicit CLI flag > environment variable > config/hyperion.rb > built-in default.

Most-used CLI flags

Flag	Default	Notes
`-b, --bind HOST`	`127.0.0.1`
`-p, --port PORT`	`9292`
`-w, --workers N`	`1`	`0` → `Etc.nprocessors`
`-t, --threads N`	`5`	OS-thread Rack handler pool per worker. `0` → run inline (debugging).
`-C, --config PATH`	`config/hyperion.rb` if present	Ruby DSL file.
`--tls-cert PATH` / `--tls-key PATH`	nil	PEM cert + key for HTTPS.
`--[no-]async-io`	off	Run plain HTTP/1.1 under `Async::Scheduler`. Required for `hyperion-async-pg` on plain HTTP.
`--preload-static DIR`	nil	Preload static assets from DIR at boot (repeatable, immutable). Rails apps auto-detect from `Rails.configuration.assets.paths`.
`--admin-token-file PATH`	unset	Auth file for `/-/quit` and `/-/metrics`. Refuses world-readable files.
`--worker-max-rss-mb MB`	unset	Master gracefully recycles a worker exceeding MB RSS.
`--max-pending COUNT`	unbounded	Per-worker accept-queue cap before HTTP 503 + `Retry-After: 1`.
`--idle-keepalive SECONDS`	`5`	Keep-alive idle timeout.
`--graceful-timeout SECONDS`	`30`	Shutdown deadline before SIGKILL.

bin/hyperion --help prints the full set, including --max-body-bytes, --max-header-bytes, --max-request-read-seconds (slowloris defence), --h2-max-total-streams, --max-in-flight-per-conn, --tls-handshake-rate-limit, and the --[no-]yjit / --[no-]log-requests toggles.

Environment variables

Config file

config/hyperion.rb — same shape as Puma's puma.rb. Auto-loaded if present. Strict DSL: unknown methods raise NoMethodError at boot.

# config/hyperion.rb
bind '0.0.0.0'
port 9292

workers      4
thread_count 10

# tls_cert_path 'config/cert.pem'
# tls_key_path  'config/key.pem'

read_timeout      30
idle_keepalive     5
graceful_timeout  30

log_level    :info
log_format   :auto
log_requests true

async_io nil    # nil = auto (1.4.0+), true = inline-on-fiber everywhere, false = pool everywhere

before_fork do
  ActiveRecord::Base.connection_handler.clear_all_connections! if defined?(ActiveRecord)
end

on_worker_boot do |worker_index|
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

A documented sample lives at config/hyperion.example.rb.

Operator guidance

Distilled from docs/BENCH_2026_04_27.md (Rails 8.1 real-app sweep). Headline finding: the simplest drop-in is the right answer.

Migrating from Puma

hyperion -t N -w M matching your current Puma -t N:N -w M. No other flags. Versus Puma at the same -t/-w shape on real Rails endpoints: +9% rps on lightweight endpoints, 28× lower p99 on health-style endpoints, 3.8× lower p99 on PG-touching endpoints. Same RSS, same operator surface — keep all your existing config, monitoring, deploy scripts.

Knobs that help on synthetic benches but not on real Rails

Knob	Synthetic	Real Rails	Recommendation
`-t 30`	+5–10% on hello-world	Hurts p99 vs `-t 10` (3.51 s vs 148 ms on `/up`) — GVL + middleware Mutex contention	Stay at `-t 10`.
`--yjit`	+5–10% on CPU-bound	Wash on dev-mode Rails	Skip until you bench production-mode.
`RAILS_POOL > 25`	n/a	No improvement at 50 or 100	Keep your existing AR pool.
`--async-io`	33–42× rps on PG-bound	Worse than drop-in (4.14 s p99 on `/up`) until your full I/O stack is fiber-cooperative	Don't enable until `redis-rb` → `async-redis`.

When `-w N` helps

Workload	Recommended	Why
Pure I/O-bound (PG / Redis / external HTTP)	`-w 1` + larger pool	`-w 1 pool=200` = 87 MB / 2,180 r/s vs `-w 4 pool=64` = 224 MB / 1,680 r/s. 2.6× memory, 0.77× rps if you pick multi-worker on wait-bound.
Pure CPU-bound	`-w N` matching CPU count	Bench: `-w 16 -t 5` hits 98,818 r/s on a 16-vCPU box.
Mixed (Rails-shaped, ~5 ms CPU + 50 ms wait)	`-w N/2` (half cores) + medium pool	`-w 4 -t 5 pool=128` = 1,740 r/s on `pg_mixed.ru`, no cold-start spike.

Read p99 not mean

Workload	Hyperion rps / p99	Closest competitor	rps ratio	p99 ratio
Hello `-w 4`	21,215 / 1.87 ms	Falcon 24,061 / 9.78 ms	0.88×	5.2× lower
CPU JSON `-w 4`	15,582 / 2.47 ms	Falcon 18,643 / 13.51 ms	0.84×	5.5× lower
Static 1 MiB	1,919 / 4.22 ms	Puma 2,074 / 55 ms	0.93×	13× lower
PG-wait `-w 1` pool=200	2,180 / 668 ms	Puma 530 + 200 timeouts	4.1×	qualitative crush

Throughput peaks are easy to fake under controlled conditions; tail latency reflects what your slowest user actually experiences when the load balancer fans them onto a busy worker.

Logging

Default behaviour:

info/debug → stdout, warn/error/fatal → stderr (12-factor).
One structured access-log line per response, info level. Disable with --no-log-requests or HYPERION_LOG_REQUESTS=0.
Format auto-selects: RAILS_ENV=production/staging → JSON; TTY → coloured text; piped output without env hint → JSON.

Sample text (TTY default):

2026-04-26T18:40:04.112Z INFO  [hyperion] message=request method=GET path=/api/v1/health status=200 duration_ms=46.63 remote_addr=127.0.0.1 http_version=HTTP/1.1

Sample JSON (production / piped):

{"ts":"2026-04-26T18:38:49.405Z","level":"info","source":"hyperion","message":"request","method":"GET","path":"/api/v1/health","status":200,"duration_ms":46.63,"remote_addr":"127.0.0.1","http_version":"HTTP/1.1"}

Metrics

Hyperion.stats returns a snapshot Hash with lock-free per-thread counters (connections_accepted, connections_active, requests_total, requests_in_flight, responses_<code>, parse_errors, app_errors, read_timeouts, requests_threadpool_dispatched, requests_async_dispatched, c_loop_requests_total).

When admin_token is set, /-/metrics emits Prometheus text-format v0.0.4. Auth is via the X-Hyperion-Admin-Token header (same token guards POST /-/quit):

$ curl -s -H 'X-Hyperion-Admin-Token: secret' http://127.0.0.1:9292/-/metrics
# HELP hyperion_requests_total Total HTTP requests handled
# TYPE hyperion_requests_total counter
hyperion_requests_total 8910
hyperion_responses_status_total{status="200"} 8521
hyperion_responses_status_total{status="404"} 12

Any counter not in the known set (added via Hyperion.metrics.increment(:custom_thing)) is auto-exported as hyperion_custom_thing with a generic HELP line. Network-isolate the admin endpoints if the listener is internet-facing — see docs/REVERSE_PROXY.md for the nginx location /-/ { return 404; } recipe.

Compatibility

Component	Version
Ruby	3.3+ (transitive `protocol-http2 ~> 0.26` floor)
Rack	3.x
Rails	verified up to 8.1
Linux kernel	5.x+ for io_uring opt-in; 4.x+ otherwise
macOS	works (TLS, h2, WebSockets, `accept4` fallback path)

Per-Rack-3-spec: auto-sets SERVER_SOFTWARE, rack.version, REMOTE_ADDR, IPv6-safe Host parsing, CRLF guard. The Hyperion::FiberLocal.install! opt-in shim handles the residual Thread.current.thread_variable_* footgun in older Rails idioms; modern Rails 7.1+ already uses Fiber storage natively.

Reproducing benchmarks

Every number in this README is reproducible. Per-row commands:

# Setup (once)
bundle install
bundle exec rake compile

# Hello via Server.handle_static + io_uring (134k r/s row)
HYPERION_IO_URING_ACCEPT=1 bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/hello_static.ru &
wrk -t4 -c100 -d20s --latency http://127.0.0.1:9292/

# Dynamic block via Server.handle (9.4k r/s row)
bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/hello_handle_block.ru &
wrk -t4 -c100 -d20s --latency http://127.0.0.1:9292/

# Generic Rack hello (4.7k r/s row)
bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/hello.ru &
wrk -t4 -c100 -d20s --latency http://127.0.0.1:9292/

# CPU JSON via block form (5.9k r/s row)
bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/work.ru &
wrk -t4 -c200 -d15s --latency http://127.0.0.1:9292/

# 4-way comparator (Hyperion vs Puma vs Falcon vs Agoo)
bash bench/4way_compare.sh

# gRPC unary + streaming (Hyperion side)
GHZ=/tmp/ghz TRIALS=3 DURATION=15s WARMUP_DURATION=3s bash bench/grpc_stream_bench.sh

# Idle keep-alive RSS sweep (10k conns × 30s hold)
bash bench/keepalive_memory.sh

PG benches (pg_concurrent.ru, pg_mixed.ru) live in the hyperion-async-pg companion repo — they require a running Postgres and the companion gem.

When numbers from your host don't match the published numbers, the most likely explanations (in order): (1) bench-host noise — single-VM benches drift 10–30% over days; (2) Puma version mismatch (sweep used Puma 8.0.1; the in-repo Gemfile pins ~> 6.4); (3) different kernel or Ruby; (4) different -t / -c (apples-to-apples requires identical worker count, thread count, wrk concurrency, payload, and TLS cipher).

Release history

See CHANGELOG.md. Recent: 2.14.0 (gRPC streaming ghz numbers; dynamic-block C dispatch — Server.handle { |env| ... } lifts hello to 9,422 r/s and CPU JSON to 5,897 r/s; Server#stop accept-wake on Linux; io_uring 4h soak), 2.13.0 (response head builder C-rewrite; gRPC streaming RPCs; soak harness), 2.12.0 (C connection lifecycle; io_uring loop hits 134k r/s; gRPC unary trailers; SO_REUSEPORT audit), 2.11.0 (HPACK CGlue default; h2 dispatch-pool warmup), 2.10.x (PageCache, Server.handle direct routes, TCP_NODELAY at accept).

Credits

Vendored llhttp (Node.js's HTTP parser, MIT) under ext/hyperion_http/llhttp/.
HTTP/2 framing and HPACK via protocol-http2.
Fiber scheduler via async.

License

MIT.