hyperion-async-pg
Async-aware shim for the pg gem. Patches PG::Connection so exec, exec_params, exec_prepared and friends cooperate with the Async fiber scheduler — while one fiber is parked on a Postgres socket waiting for query results, other fibers in the same OS thread serve other requests. Companion to the Hyperion HTTP server. Pure Ruby, drop-in, no behavior change outside an Async scheduler.
Install
# Gemfile
gem 'hyperion-async-pg'
# config/initializers/async_pg.rb (Rails) or wherever your app boots
require 'hyperion/async_pg'
Hyperion::AsyncPg.install!
install! is idempotent and thread-safe. Call once at boot, before any DB connections are opened. It returns true on the first call, false thereafter.
Compatibility
The pg-level patch applies transparently to anything that calls into PG::Connection:
- Raw
pg(verified): your ownPG::Connection.new(...).exec_params(...)calls. This is the path the verified bench numbers below cover. - Sequel (
postgresadapter): the adapter callsexec_params/exec_prepareddirectly — patch reaches it. Pair with a fiber-aware pool, e.g.Sequel.connect(..., max_connections: N)driven from a fiber-safe pool wrapper. - ROM-sql + rom-pg: same as Sequel underneath.
- ActiveRecord (NOT fiber-aware in 7.2 / 8.1 — see warning below).
No driver-side opt-in required for the exec_* patch. Patches are prepended onto PG::Connection, so every caller in the process picks them up. The catch is the connection pool the driver uses around those calls, which is the actual concurrency knob — see "Connection pool" below.
ActiveRecord: the pool is the bottleneck
Verified Apr 2026 against AR 7.2.3.1 and AR 8.1.3, Ruby 3.3.3 + Hyperion
--async-io: ActiveRecord's built-in connection pool (ActiveRecord::Base.connection_pool.with_connection) is not fiber-aware. Two fibers in the same OS thread end up handed the samePG::Connection, the secondsend_query_*fires onto a busy connection, and you getPG::UnableToSend: PQsendQuery another command is already in progress. The bench atbench/active_record.rureproduced this at 100 % 5xx on every request under 200 wrk connections — only 8.6 r/s of completed responses, allActiveRecord::StatementInvalid. Neitherconnection_handler.isolated_connection_poolnor a magic per-fiber checkout is wired up by default in 7.2 or 8.1; theFiber[:active_record_connection_pool]indirection that earlier release notes hinted at does not arrive in those releases.Until upstream AR ships a fiber-safe pool (or you patch it yourself), do not rely on hyperion-async-pg + ActiveRecord under fiber concurrency. Options:
- Use raw
pgthroughHyperion::AsyncPg::FiberPoolfor the hot path that needs the fiber win, keep AR on a separate per-fiber connection or a Sidekiq/rake-style code path with no fiber scheduler.- Wrap AR with
async-poolyourself — see the example below — and callconnection.raw_connection.exec_params(...)(thepgpatch works on the wrapped connection). This skips AR's query interface; you lose AR's type casting and prepared-statement cache.- Stay on Puma until AR's pool catches up — the patch stays silent and harmless under non-async servers.
Server support matrix
This shim only delivers fiber concurrency when the HTTP server runs each request inside an Async::Scheduler. Without a scheduler, IO#wait_readable blocks the OS thread normally — the patch is silent and harmless, but produces no concurrency win.
| Server | Path | Concurrency win? | Notes |
|---|---|---|---|
| Falcon | any | ✅ yes | Native fiber scheduler per request. Drop-in. |
Hyperion >= 1.3.0 --async-io |
plain HTTP/1.1 | ✅ yes | Opt-in flag re-enables the Async accept loop on plain HTTP/1.1 + bypasses the thread pool so handlers run inline on the accept-loop fiber. Recommended. |
Hyperion --tls-cert ... (HTTPS h1) |
TLS / h1 | ✅ yes | TLS path always runs start_async_loop; every dispatch is a fiber. Works on 1.0.0+ — no --async-io flag needed. Verified 2026-04-27: -t 64 --tls-cert ... pool=64 = 742.7 r/s, p99 489 ms on the 50 ms pg_sleep workload (wrk -c 100 -d 20s, all 14 884 reqs 2xx). Throughput is bounded by -t N because each h1 dispatch hops through the worker pool — raise -t to match peak fiber concurrency on TLS. |
Hyperion --tls-cert ... (HTTPS h2) |
h2 streams | ✅ yes | Each h2 stream is a fiber by design. Verified 2026-04-27: same -t 64 --tls-cert ... pool=64 = 706.4 r/s, 5000/5000 succeeded, 0 errors under h2load -c 50 -m 20 -n 5000. Same -t N bound as h1+TLS. The <-t 5 configs flood protocol-http2 flow-control on >25× concurrency under stress (stream-dispatch errors); set -t ≥ peak h2 concurrency. |
Hyperion < 1.3.0 plain HTTP/1.1 |
thread pool | ❌ no | 1.2.0's perf-bypass (start_raw_loop) hands the whole socket to a worker thread with no scheduler — patch is silent. Upgrade to 1.3.0 + --async-io. |
| Puma | any | ❌ no | No fiber scheduler. Patch is silent, behaviour identical to plain pg. |
| Sidekiq / scripts / rake | any | ❌ no (and that's fine) | No scheduler → no patch effect. Drop-in safe. |
If you're on Hyperion 1.3.0+, set async_io: true in your config (or --async-io on the CLI) and pair with a fiber-aware connection pool — see next section.
Connection pool — use a fiber-aware one
The popular connection_pool gem (used by ActiveRecord, Sidekiq, etc.) is not fiber-aware: its internal Mutex + ConditionVariable don't yield to the Async scheduler. A fiber waiting for a connection blocks the entire OS thread, defeating this shim's purpose. Symptoms: throughput same as plain pg even though wait_readable is firing; under heavy load Falcon may report "Closing scheduler with blocked operations!".
This gem ships Hyperion::AsyncPg::FiberPool so you don't have to roll your own:
require 'hyperion/async_pg'
Hyperion::AsyncPg.install!
# Per-worker — call from on_worker_boot in multi-worker setups
$pg_pool = Hyperion::AsyncPg::FiberPool.new(size: 64) do
PG.connect(ENV['DATABASE_URL'])
end.fill
# In your handler
$pg_pool.with do |conn|
conn.exec_params('SELECT ...', [...])
end
Internals: FiberPool wraps Async::Semaphore (fiber-aware) around a plain Array. acquire waits for a slot via the semaphore (cooperating fibers, OS thread keeps serving others); pop/push around the Array is atomic per-thread under GVL.
Fills connections in parallel (8 threads by default) so a 64-connection pool over a 100ms-RTT WAN takes ~1s instead of ~6.4s at boot. Override with parallel_fill_threads: N (set to 1 for fully serial fill).
Alternative: async-pool
If you need lazy-on-demand connection creation, idle eviction (max-age), or graceful close on shutdown — features FiberPool doesn't ship — use async-pool instead. It's the canonical pool implementation in the Async ecosystem; same fiber-cooperation properties, broader feature set.
# Gemfile
gem 'async-pool'
require 'async'
require 'async/pool/controller'
require 'async/pool/resource'
require 'pg'
require 'hyperion/async_pg'
Hyperion::AsyncPg.install!
class PgConnectionResource < Async::Pool::Resource
def self.call
new(PG.connect(ENV['DATABASE_URL']))
end
def initialize(conn) = (super(); @conn = conn)
attr_reader :conn
def viable? = !@conn.finished?
def reusable? = viable?
def close = (@conn.close unless @conn.finished?)
end
$pg_pool = Async::Pool::Controller.new(PgConnectionResource, limit: 64)
# In your handler:
$pg_pool.acquire do |resource|
resource.conn.exec_params('SELECT ...', [...])
end
Working rackup at bench/async_pool_example.ru. Note that async-pool creates connections lazily — the first burst of N requests pays the cumulative PG.connect cost. For warm-pool semantics like FiberPool, pre-create resources in on_worker_boot (call pool.acquire { ... } limit times before serving traffic).
Other options
- A per-fiber connection (no pool) — works, but holds a connection for the fiber's lifetime; size your Postgres
max_connectionsaccordingly. - ActiveRecord — see the warning above; AR's pool is not currently fiber-safe in 7.2/8.1.
Caveats
- Only yields under a fiber scheduler. Outside
Async { ... }(Sidekiq workers, plain scripts, rake tasks, Rails console) the patched methods behave identically to plainpg—IO#wait_readablefalls back to its blocking implementation whenFiber.schedulerisnil. There is no perf regression in non-async contexts. - Long-running statements still block the calling fiber. The shim parks a fiber on the socket; it does not preempt the running query. A 10 s
SELECTstill ties up that fiber for 10 s. Cap runaway queries with Postgresstatement_timeout(or session-levelSET statement_timeout), not at the Ruby layer. - Connection pool sizing. Under Hyperion + this shim, fibers vastly outnumber threads — each fiber can hold a checked-out DB connection while it waits on Postgres. A worker with 10 OS threads and 200 concurrent fibers can hold 200 in-flight connections. Size your
pool:(ActiveRecord) or:max_connections(Sequel) and your Postgresmax_connectionsaccordingly. Rule of thumb: pool >= peak concurrent fibers per worker. - Single-statement only. The shim drains all results and returns the last one, matching pg's default
exec_paramssemantics. Multi-statement strings sent throughexecproduce the last result, as before.
Tuning
| Env var | Default | Meaning |
|---|---|---|
HYPERION_ASYNC_PG_READ_TIMEOUT |
unset (block forever) | Seconds passed to IO#wait_readable per poll. Unset matches pg's default — rely on Postgres statement_timeout for the upper bound. Set when you want a hard ceiling on a single socket-wait independent of server-side timeouts; on timeout the shim raises PG::ConnectionBad. |
Read at every dispatch; no restart required.
Troubleshooting
macOS: objc[NNN]: +[NSCharacterSet initialize] may have been in progress in another thread when fork() was called
Falcon (and any pre-fork server) on macOS crashes if you PG.connect BEFORE the fork — pg's libpq pulls in Foundation/objc, which is not fork-safe under recent macOS. Symptom: child workers SIGABRT immediately on boot, often with the message above.
Workaround:
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
falcon serve --bind http://127.0.0.1:9292 --count 4 -c config.ru
Linux is unaffected. The cleaner fix is to defer pool creation to on_worker_boot (post-fork) — see the multi-worker section below — so no PG sockets exist in the parent at fork time.
Verified bench
Ubuntu 24.04 / 16 vCPU / Ruby 3.3.3, Postgres 17 over a WAN link, wrk -t4 -c200 -d20s. All configs single-worker (-w 1) unless noted; all returned 0 non-2xx and 0 wrk timeouts. RSS sampled mid-run via ps -o rss.
Two workloads:
- wait (
bench/pg_concurrent.ru):SELECT pg_sleep(0.05)+ tiny JSON. Pure wait-bound — the "best case" for this gem. - mix (
bench/pg_mixed.ru): same query + a 50-key JSON serialization (~5 ms CPU). The honest case — what real Rails apps look like.
Wait-bound workload
| Setup | r/s | p99 | RSS | vs Puma -t 5 |
|---|---|---|---|---|
Puma 8.0 -t 5 pool=5 |
56.5 | 3.88 s | 87 MB | 1.0× |
Puma 8.0 -t 30 pool=30 |
402.1 | 880 ms | 99 MB | 7.1× |
Puma 8.0 -t 100 pool=100 |
1067.4 | 557 ms | 121 MB | 18.9× |
Hyperion --async-io -t 5 pool=5 |
56.7 | 3.82 s | 108 MB | 1.0× |
Hyperion --async-io -t 5 pool=32 |
400.4 | 878 ms | 123 MB | 7.1× |
Hyperion --async-io -t 5 pool=64 |
778.9 | 638 ms | 133 MB | 13.8× |
Hyperion --async-io -t 5 pool=128 |
1344.2 | 536 ms | 148 MB | 23.8× |
Hyperion --async-io -t 5 pool=200 |
2381.4 | 471 ms | 164 MB | 42.2× |
Hyperion --async-io -w 4 -t 5 pool=32 |
1443.0 | 2.69 s | 503 MB | 25.5× (cold-start p99 — see note) |
Hyperion --async-io -w 4 -t 5 pool=64 |
1937.5 | 4.84 s | 416 MB | 34.3× (cold-start p99 — see note) |
Falcon 0.55.3 --count 1 pool=128 |
1665.7 | 516 ms | 141 MB | 29.5× |
Mixed CPU+wait workload (50 ms PG + 50-key JSON serialization)
| Setup | r/s | p99 | RSS | vs Puma -t 30 |
|---|---|---|---|---|
Puma 8.0 -t 30 pool=30 |
351.7 | 963 ms | 127 MB | 1.0× |
Hyperion --async-io -t 5 pool=32 |
371.2 | 919 ms | 151 MB | 1.05× |
Hyperion --async-io -t 5 pool=64 |
741.5 | 681 ms | 161 MB | 2.1× |
Hyperion --async-io -t 5 pool=128 |
1739.9 | 512 ms | 201 MB | 4.9× |
Hyperion --async-io -w 4 -t 5 pool=32 |
1303.0 | 2.99 s | 675 MB | 3.7× (cold-start p99) |
Falcon 0.55.3 --count 1 pool=128 |
1642.1 | 531 ms | 213 MB | 4.7× |
Mixed throughput slightly EXCEEDS pure-wait at high pools — pool=128 mixed (1740 r/s) beats wait-only (1344 r/s). The JSON CPU work overlaps the PG-wait windows of other fibers; a longer per-request lifetime lets the scheduler pack more in-flight requests. Counter-intuitive, real, reproducible.
What the RSS column actually tells us
Linux thread stacks are demand-paged, so Puma's worst-case "100 threads × 8 MB virtual" doesn't surface as 800 MB RSS — only ~30-40 MB of actual paged memory. At single-worker pool sizes ≤ 200, PG connection buffers (~600 KB per conn) dominate RSS, not thread stacks. So the headline difference between Hyperion and Puma is throughput + tail latency, not RSS, at this pool scale.
Where the architectural memory story DOES land:
- Connection count: Puma is capped at
max_threadsconcurrent in-flight queries. Hyperion--async-iois capped atpool_size. To match Hyperion-pool-200, Puma needs-t 200— 200 OS threads pin the address space and increase context-switch overhead, even if RSS doesn't blow up. - Idle keep-alive connections: each Puma idle keep-alive holds an OS thread. Hyperion holds a ~1 KB fiber. At 10k idle clients, this is the dramatic difference — but that's a different bench (see Hyperion's 10k-connection bench), not this one.
-w 4 cold-start caveat
Multi-worker configs show inflated p99 (2.69-4.84 s) because bench/pg_concurrent.ru uses lazy per-process pool init: each child worker pays the full pool-fill cost on its first request after fork. With pool=64 × 4 workers × ~100 ms PG.connect over WAN = ~25 s of cold-start work spread across the first few requests on each worker. r/s is fine over the 20 s window, p99 absorbs the spike. Production apps should pre-fill via Hyperion's on_worker_boot lifecycle hook (sketch in bench/pg_concurrent.ru's footer comment); that eliminates the cold-start p99 entirely.
Reproduce
Reproduce
gem install hyperion-rb hyperion-async-pg pg
git clone https://github.com/andrew-woblavobla/hyperion-async-pg && cd hyperion-async-pg
bundle install
# Wait-bound (pure pg_sleep)
DATABASE_URL='postgres://user:pass@host/db' MODE=async PG_POOL_SIZE=200 \
bundle exec hyperion --async-io -t 5 -w 1 -p 9292 bench/pg_concurrent.ru &
wrk -t4 -c200 -d20s --latency http://127.0.0.1:9292/
# Mixed (pg_sleep + 50-key JSON serialization)
DATABASE_URL='...' MODE=async PG_POOL_SIZE=128 \
bundle exec hyperion --async-io -t 5 -w 1 -p 9293 bench/pg_mixed.ru &
wrk -t4 -c200 -d20s --latency http://127.0.0.1:9293/
Postgres
max_connections: pool=200 needs at least 200 free connections in your PG instance. Default is 100 — bump withALTER SYSTEM SET max_connections = 500;+ Postgres restart, or scale the pool to fit.Multi-worker (
-w N): PG sockets opened in the master are shared across forked workers; multiple worker pids reading the same kernel socket buffer interleave bytes and corrupt the wire protocol (this is why a top-level-init bench returns 99.99% 500s under-w 4). Initialize the pool inside each child — either via lazy first-request init (whatbench/pg_concurrent.rudoes) or via Hyperion'son_worker_boothook in a config file:# hyperion.rb — passed via `bundle exec hyperion -C hyperion.rb ...` on_worker_boot do require 'hyperion/async_pg' Hyperion::AsyncPg.install! $pg_pool = Hyperion::AsyncPg::FiberPool.new(size: 64) do PG.connect(ENV['DATABASE_URL']) end.fill end
on_worker_bootpre-warms the pool before the worker accepts its first request (no first-request latency spike). Lazy init is cheaper to set up and acceptable for benchmarks where the first-request cost is amortized over a 20s+ run.
Caveats
The win evaporates if any of these is wrong:
- Server doesn't run requests under
Async::Scheduler(use Hyperion--async-io, Falcon, or Hyperion-over-TLS). - Connection pool isn't fiber-aware (
connection_poolgem blocks the OS thread). - Workload isn't actually wait-bound (CPU-heavy handlers don't benefit; the gain is exactly the PG round-trip you can stack).
How it works
PG::Connection#exec_params(...) (and the other patched methods) becomes:
- Call the non-blocking
send_query_params(...)C function — fires the query off, returns immediately. - Loop:
consume_input→ checkis_busy→ if busy,socket_io.wait_readable. UnderAsync::Scheduler,wait_readableyields the fiber. Without one, it blocks the OS thread. - Drain results with
get_result, return the final one (afterresult.checkto surface errors).
No threads, no extra IO objects, no copy of the result through Ruby. The C extension does all the work; we only swap the wait primitive.
License
MIT.