hyperion-async-pg

Async-aware shim for the pg gem. Patches PG::Connection so exec, exec_params, exec_prepared and friends cooperate with the Async fiber scheduler — while one fiber is parked on a Postgres socket waiting for query results, other fibers in the same OS thread serve other requests. Companion to the Hyperion HTTP server. Pure Ruby, drop-in, no behavior change outside an Async scheduler.

Install

# Gemfile
gem 'hyperion-async-pg'
# config/initializers/async_pg.rb (Rails) or wherever your app boots
require 'hyperion/async_pg'
Hyperion::AsyncPg.install!

install! is idempotent and thread-safe. Call once at boot, before any DB connections are opened. It returns true on the first call, false thereafter.

Compatibility

The pg-level patch applies transparently to anything that calls into PG::Connection:

  • Raw pg (verified): your own PG::Connection.new(...).exec_params(...) calls. This is the path the verified bench numbers below cover.
  • Sequel (postgres adapter): the adapter calls exec_params / exec_prepared directly — patch reaches it. Pair with a fiber-aware pool, e.g. Sequel.connect(..., max_connections: N) driven from a fiber-safe pool wrapper.
  • ROM-sql + rom-pg: same as Sequel underneath.
  • ActiveRecord (NOT fiber-aware in 7.2 / 8.1 — see warning below).

No driver-side opt-in required for the exec_* patch. Patches are prepended onto PG::Connection, so every caller in the process picks them up. The catch is the connection pool the driver uses around those calls, which is the actual concurrency knob — see "Connection pool" below.

ActiveRecord: the pool is the bottleneck

Verified Apr 2026 against AR 7.2.3.1 and AR 8.1.3, Ruby 3.3.3 + Hyperion --async-io: ActiveRecord's built-in connection pool (ActiveRecord::Base.connection_pool.with_connection) is not fiber-aware. Two fibers in the same OS thread end up handed the same PG::Connection, the second send_query_* fires onto a busy connection, and you get PG::UnableToSend: PQsendQuery another command is already in progress. The bench at bench/active_record.ru reproduced this at 100 % 5xx on every request under 200 wrk connections — only 8.6 r/s of completed responses, all ActiveRecord::StatementInvalid. Neither connection_handler.isolated_connection_pool nor a magic per-fiber checkout is wired up by default in 7.2 or 8.1; the Fiber[:active_record_connection_pool] indirection that earlier release notes hinted at does not arrive in those releases.

Until upstream AR ships a fiber-safe pool (or you patch it yourself), do not rely on hyperion-async-pg + ActiveRecord under fiber concurrency. Options:

  1. Use raw pg through Hyperion::AsyncPg::FiberPool for the hot path that needs the fiber win, keep AR on a separate per-fiber connection or a Sidekiq/rake-style code path with no fiber scheduler.
  2. Wrap AR with async-pool yourself — see the example below — and call connection.raw_connection.exec_params(...) (the pg patch works on the wrapped connection). This skips AR's query interface; you lose AR's type casting and prepared-statement cache.
  3. Stay on Puma until AR's pool catches up — the patch stays silent and harmless under non-async servers.

Server support matrix

This shim only delivers fiber concurrency when the HTTP server runs each request inside an Async::Scheduler. Without a scheduler, IO#wait_readable blocks the OS thread normally — the patch is silent and harmless, but produces no concurrency win.

Server Path Concurrency win? Notes
Falcon any ✅ yes Native fiber scheduler per request. Drop-in.
Hyperion >= 1.3.0 --async-io plain HTTP/1.1 ✅ yes Opt-in flag re-enables the Async accept loop on plain HTTP/1.1 + bypasses the thread pool so handlers run inline on the accept-loop fiber. Recommended.
Hyperion --tls-cert ... (HTTPS h1) TLS / h1 ✅ yes TLS path always runs start_async_loop; every dispatch is a fiber. Works on 1.0.0+ — no --async-io flag needed. Verified 2026-04-27: -t 64 --tls-cert ... pool=64 = 742.7 r/s, p99 489 ms on the 50 ms pg_sleep workload (wrk -c 100 -d 20s, all 14 884 reqs 2xx). Throughput is bounded by -t N because each h1 dispatch hops through the worker pool — raise -t to match peak fiber concurrency on TLS.
Hyperion --tls-cert ... (HTTPS h2) h2 streams ✅ yes Each h2 stream is a fiber by design. Verified 2026-04-27: same -t 64 --tls-cert ... pool=64 = 706.4 r/s, 5000/5000 succeeded, 0 errors under h2load -c 50 -m 20 -n 5000. Same -t N bound as h1+TLS. The <-t 5 configs flood protocol-http2 flow-control on >25× concurrency under stress (stream-dispatch errors); set -t ≥ peak h2 concurrency.
Hyperion < 1.3.0 plain HTTP/1.1 thread pool ❌ no 1.2.0's perf-bypass (start_raw_loop) hands the whole socket to a worker thread with no scheduler — patch is silent. Upgrade to 1.3.0 + --async-io.
Puma any ❌ no No fiber scheduler. Patch is silent, behaviour identical to plain pg.
Sidekiq / scripts / rake any ❌ no (and that's fine) No scheduler → no patch effect. Drop-in safe.

If you're on Hyperion 1.3.0+, set async_io: true in your config (or --async-io on the CLI) and pair with a fiber-aware connection pool — see next section.

Connection pool — use a fiber-aware one

The popular connection_pool gem (used by ActiveRecord, Sidekiq, etc.) is not fiber-aware: its internal Mutex + ConditionVariable don't yield to the Async scheduler. A fiber waiting for a connection blocks the entire OS thread, defeating this shim's purpose. Symptoms: throughput same as plain pg even though wait_readable is firing; under heavy load Falcon may report "Closing scheduler with blocked operations!".

This gem ships Hyperion::AsyncPg::FiberPool so you don't have to roll your own:

require 'hyperion/async_pg'
Hyperion::AsyncPg.install!

# Per-worker — call from on_worker_boot in multi-worker setups
$pg_pool = Hyperion::AsyncPg::FiberPool.new(size: 64) do
  PG.connect(ENV['DATABASE_URL'])
end.fill

# In your handler
$pg_pool.with do |conn|
  conn.exec_params('SELECT ...', [...])
end

Internals: FiberPool wraps Async::Semaphore (fiber-aware) around a plain Array. acquire waits for a slot via the semaphore (cooperating fibers, OS thread keeps serving others); pop/push around the Array is atomic per-thread under GVL.

Fills connections in parallel (8 threads by default) so a 64-connection pool over a 100ms-RTT WAN takes ~1s instead of ~6.4s at boot. Override with parallel_fill_threads: N (set to 1 for fully serial fill).

Alternative: async-pool

If you need lazy-on-demand connection creation, idle eviction (max-age), or graceful close on shutdown — features FiberPool doesn't ship — use async-pool instead. It's the canonical pool implementation in the Async ecosystem; same fiber-cooperation properties, broader feature set.

# Gemfile
gem 'async-pool'
require 'async'
require 'async/pool/controller'
require 'async/pool/resource'
require 'pg'
require 'hyperion/async_pg'
Hyperion::AsyncPg.install!

class PgConnectionResource < Async::Pool::Resource
  def self.call
    new(PG.connect(ENV['DATABASE_URL']))
  end

  def initialize(conn) = (super(); @conn = conn)
  attr_reader :conn

  def viable?  = !@conn.finished?
  def reusable? = viable?
  def close    = (@conn.close unless @conn.finished?)
end

$pg_pool = Async::Pool::Controller.new(PgConnectionResource, limit: 64)

# In your handler:
$pg_pool.acquire do |resource|
  resource.conn.exec_params('SELECT ...', [...])
end

Working rackup at bench/async_pool_example.ru. Note that async-pool creates connections lazily — the first burst of N requests pays the cumulative PG.connect cost. For warm-pool semantics like FiberPool, pre-create resources in on_worker_boot (call pool.acquire { ... } limit times before serving traffic).

Other options

  • A per-fiber connection (no pool) — works, but holds a connection for the fiber's lifetime; size your Postgres max_connections accordingly.
  • ActiveRecord — see the warning above; AR's pool is not currently fiber-safe in 7.2/8.1.

Caveats

  • Only yields under a fiber scheduler. Outside Async { ... } (Sidekiq workers, plain scripts, rake tasks, Rails console) the patched methods behave identically to plain pgIO#wait_readable falls back to its blocking implementation when Fiber.scheduler is nil. There is no perf regression in non-async contexts.
  • Long-running statements still block the calling fiber. The shim parks a fiber on the socket; it does not preempt the running query. A 10 s SELECT still ties up that fiber for 10 s. Cap runaway queries with Postgres statement_timeout (or session-level SET statement_timeout), not at the Ruby layer.
  • Connection pool sizing. Under Hyperion + this shim, fibers vastly outnumber threads — each fiber can hold a checked-out DB connection while it waits on Postgres. A worker with 10 OS threads and 200 concurrent fibers can hold 200 in-flight connections. Size your pool: (ActiveRecord) or :max_connections (Sequel) and your Postgres max_connections accordingly. Rule of thumb: pool >= peak concurrent fibers per worker.
  • Single-statement only. The shim drains all results and returns the last one, matching pg's default exec_params semantics. Multi-statement strings sent through exec produce the last result, as before.

Tuning

Env var Default Meaning
HYPERION_ASYNC_PG_READ_TIMEOUT unset (block forever) Seconds passed to IO#wait_readable per poll. Unset matches pg's default — rely on Postgres statement_timeout for the upper bound. Set when you want a hard ceiling on a single socket-wait independent of server-side timeouts; on timeout the shim raises PG::ConnectionBad.

Read at every dispatch; no restart required.

Troubleshooting

macOS: objc[NNN]: +[NSCharacterSet initialize] may have been in progress in another thread when fork() was called

Falcon (and any pre-fork server) on macOS crashes if you PG.connect BEFORE the fork — pg's libpq pulls in Foundation/objc, which is not fork-safe under recent macOS. Symptom: child workers SIGABRT immediately on boot, often with the message above.

Workaround:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
falcon serve --bind http://127.0.0.1:9292 --count 4 -c config.ru

Linux is unaffected. The cleaner fix is to defer pool creation to on_worker_boot (post-fork) — see the multi-worker section below — so no PG sockets exist in the parent at fork time.

Verified bench

Ubuntu 24.04 / 16 vCPU / Ruby 3.3.3, Postgres 17 over a WAN link, wrk -t4 -c200 -d20s. All configs single-worker (-w 1) unless noted; all returned 0 non-2xx and 0 wrk timeouts. RSS sampled mid-run via ps -o rss.

Two workloads:

  • wait (bench/pg_concurrent.ru): SELECT pg_sleep(0.05) + tiny JSON. Pure wait-bound — the "best case" for this gem.
  • mix (bench/pg_mixed.ru): same query + a 50-key JSON serialization (~5 ms CPU). The honest case — what real Rails apps look like.

Wait-bound workload

Setup r/s p99 RSS vs Puma -t 5
Puma 8.0 -t 5 pool=5 56.5 3.88 s 87 MB 1.0×
Puma 8.0 -t 30 pool=30 402.1 880 ms 99 MB 7.1×
Puma 8.0 -t 100 pool=100 1067.4 557 ms 121 MB 18.9×
Hyperion --async-io -t 5 pool=5 56.7 3.82 s 108 MB 1.0×
Hyperion --async-io -t 5 pool=32 400.4 878 ms 123 MB 7.1×
Hyperion --async-io -t 5 pool=64 778.9 638 ms 133 MB 13.8×
Hyperion --async-io -t 5 pool=128 1344.2 536 ms 148 MB 23.8×
Hyperion --async-io -t 5 pool=200 2381.4 471 ms 164 MB 42.2×
Hyperion --async-io -w 4 -t 5 pool=32 1443.0 2.69 s 503 MB 25.5× (cold-start p99 — see note)
Hyperion --async-io -w 4 -t 5 pool=64 1937.5 4.84 s 416 MB 34.3× (cold-start p99 — see note)
Falcon 0.55.3 --count 1 pool=128 1665.7 516 ms 141 MB 29.5×

Mixed CPU+wait workload (50 ms PG + 50-key JSON serialization)

Setup r/s p99 RSS vs Puma -t 30
Puma 8.0 -t 30 pool=30 351.7 963 ms 127 MB 1.0×
Hyperion --async-io -t 5 pool=32 371.2 919 ms 151 MB 1.05×
Hyperion --async-io -t 5 pool=64 741.5 681 ms 161 MB 2.1×
Hyperion --async-io -t 5 pool=128 1739.9 512 ms 201 MB 4.9×
Hyperion --async-io -w 4 -t 5 pool=32 1303.0 2.99 s 675 MB 3.7× (cold-start p99)
Falcon 0.55.3 --count 1 pool=128 1642.1 531 ms 213 MB 4.7×

Mixed throughput slightly EXCEEDS pure-wait at high poolspool=128 mixed (1740 r/s) beats wait-only (1344 r/s). The JSON CPU work overlaps the PG-wait windows of other fibers; a longer per-request lifetime lets the scheduler pack more in-flight requests. Counter-intuitive, real, reproducible.

What the RSS column actually tells us

Linux thread stacks are demand-paged, so Puma's worst-case "100 threads × 8 MB virtual" doesn't surface as 800 MB RSS — only ~30-40 MB of actual paged memory. At single-worker pool sizes ≤ 200, PG connection buffers (~600 KB per conn) dominate RSS, not thread stacks. So the headline difference between Hyperion and Puma is throughput + tail latency, not RSS, at this pool scale.

Where the architectural memory story DOES land:

  • Connection count: Puma is capped at max_threads concurrent in-flight queries. Hyperion --async-io is capped at pool_size. To match Hyperion-pool-200, Puma needs -t 200 — 200 OS threads pin the address space and increase context-switch overhead, even if RSS doesn't blow up.
  • Idle keep-alive connections: each Puma idle keep-alive holds an OS thread. Hyperion holds a ~1 KB fiber. At 10k idle clients, this is the dramatic difference — but that's a different bench (see Hyperion's 10k-connection bench), not this one.

-w 4 cold-start caveat

Multi-worker configs show inflated p99 (2.69-4.84 s) because bench/pg_concurrent.ru uses lazy per-process pool init: each child worker pays the full pool-fill cost on its first request after fork. With pool=64 × 4 workers × ~100 ms PG.connect over WAN = ~25 s of cold-start work spread across the first few requests on each worker. r/s is fine over the 20 s window, p99 absorbs the spike. Production apps should pre-fill via Hyperion's on_worker_boot lifecycle hook (sketch in bench/pg_concurrent.ru's footer comment); that eliminates the cold-start p99 entirely.

Reproduce

Reproduce

gem install hyperion-rb hyperion-async-pg pg
git clone https://github.com/andrew-woblavobla/hyperion-async-pg && cd hyperion-async-pg
bundle install

# Wait-bound (pure pg_sleep)
DATABASE_URL='postgres://user:pass@host/db' MODE=async PG_POOL_SIZE=200 \
  bundle exec hyperion --async-io -t 5 -w 1 -p 9292 bench/pg_concurrent.ru &
wrk -t4 -c200 -d20s --latency http://127.0.0.1:9292/

# Mixed (pg_sleep + 50-key JSON serialization)
DATABASE_URL='...' MODE=async PG_POOL_SIZE=128 \
  bundle exec hyperion --async-io -t 5 -w 1 -p 9293 bench/pg_mixed.ru &
wrk -t4 -c200 -d20s --latency http://127.0.0.1:9293/

Postgres max_connections: pool=200 needs at least 200 free connections in your PG instance. Default is 100 — bump with ALTER SYSTEM SET max_connections = 500; + Postgres restart, or scale the pool to fit.

Multi-worker (-w N): PG sockets opened in the master are shared across forked workers; multiple worker pids reading the same kernel socket buffer interleave bytes and corrupt the wire protocol (this is why a top-level-init bench returns 99.99% 500s under -w 4). Initialize the pool inside each child — either via lazy first-request init (what bench/pg_concurrent.ru does) or via Hyperion's on_worker_boot hook in a config file:

# hyperion.rb — passed via `bundle exec hyperion -C hyperion.rb ...`
on_worker_boot do
  require 'hyperion/async_pg'
  Hyperion::AsyncPg.install!
  $pg_pool = Hyperion::AsyncPg::FiberPool.new(size: 64) do
    PG.connect(ENV['DATABASE_URL'])
  end.fill
end

on_worker_boot pre-warms the pool before the worker accepts its first request (no first-request latency spike). Lazy init is cheaper to set up and acceptable for benchmarks where the first-request cost is amortized over a 20s+ run.

Caveats

The win evaporates if any of these is wrong:

  • Server doesn't run requests under Async::Scheduler (use Hyperion --async-io, Falcon, or Hyperion-over-TLS).
  • Connection pool isn't fiber-aware (connection_pool gem blocks the OS thread).
  • Workload isn't actually wait-bound (CPU-heavy handlers don't benefit; the gain is exactly the PG round-trip you can stack).

How it works

PG::Connection#exec_params(...) (and the other patched methods) becomes:

  1. Call the non-blocking send_query_params(...) C function — fires the query off, returns immediately.
  2. Loop: consume_input → check is_busy → if busy, socket_io.wait_readable. Under Async::Scheduler, wait_readable yields the fiber. Without one, it blocks the OS thread.
  3. Drain results with get_result, return the final one (after result.check to surface errors).

No threads, no extra IO objects, no copy of the result through Ruby. The C extension does all the work; we only swap the wait primitive.

License

MIT.