pipeloader
Transparent libpq pipelining for graphql-ruby on ActiveRecord. During GraphQL
response building, every ActiveRecord SELECT is routed through a libpq pipeline,
so a nested query resolves in roughly one round trip per tree level — with
plain resolvers and plain models. No Futures, no dataloader.load, no field
changes.
Adopting it
One line:
class AppSchema < GraphQL::Schema
use Pipeloader
end
That's the whole adoption surface. Your types and resolvers stay exactly as they are — ordinary ActiveRecord:
class Types::Post < GraphQL::Schema::Object
field :title, String, null: false
field :author, Types::Author, null: false # resolves via post.author
field :comments, [Types::Comment], null: false # resolves via post.comments
end
post.author, post.comments, a has_many, a .where(...) in a hand-written
resolver — any AR SELECT issued while building the response is intercepted and
pipelined. Because it hooks AR's query path (not the GraphQL field), nothing
leaks back to synchronous N+1, even from custom resolver code.
What it does
example/run.rb — plain resolvers, against a seeded database:
{ posts(limit: 50) { title author { name } comments { body commenter { name } } } }
resolved 50 posts with PLAIN AR resolvers
pipeline round-trips: 3
queries pipelined: 403
naive N+1 would be: ~594 round trips
Three round trips: posts → (authors + comments) → commenters. The to-one
author and the to-many comments are different shapes at the same level, yet
collapse into a single round trip.
Benchmark
The same 3-level query (posts → author + comments → commenter, 25 posts),
resolved four ways, with real network latency added by a local TCP proxy in
front of Postgres (example/latency_proxy.rb delays the request direction, so a
synchronous query pays RTT once and a pipelined burst pays it once). Min of 3
iterations; your numbers will vary.
| approach | RTT 0 | RTT 1 ms | RTT 5 ms | round-trips |
|---|---|---|---|---|
| naive (N+1) | 94 ms | 505 ms | 1972 ms | 290 |
AR includes (hand-written) |
17 ms | 22 ms | 42 ms | 4 |
GraphQL::Dataloader |
16 ms | 21 ms | 42 ms | 4 |
| pipeloader | 41 ms | 45 ms | 73 ms | 3 |
Reading it honestly:
- vs the N+1 you actually have — the headline. pipeloader turns 290 round trips into 3 with zero resolver code, so at a 5 ms hop it's ~24× faster than naive. Most "there's an N+1 in here somewhere" code is the naive row.
- vs batching (
includes/GraphQL::Dataloader) — at low/moderate RTT, batching still wins: its 4INqueries do less work than pipeloader's ~400 prepared point queries. pipeloader prepares + caches statements per connection (so parse cost is amortized to ~one parse per query shape), but it still runs 400 bind+executes and builds 400 results. Pipelining cuts round trips; batching cuts server work. pipeloader does fewer round trips (3 vs 4 — it collapses the to-oneauthorand to-manycommentsinto one burst, where Dataloader runs them as two sequential sources), so it closes the gap as RTT rises and passes the batchers around ~25 ms RTT (cross-region). Same point-vs-batch tradeoff the Go experiments in this repo show. - What pipeloader actually buys you: zero code, for any query shape.
GraphQL::Dataloaderneeds a source plus a.loadcall per association;includesmust be hand-written per query and kept in sync with the selection. pipeloader isuse Pipeloaderand ordinary resolvers.
Run it: ruby example/bench.rb (needs the seeded graphql_experiment DB).
Scaling with tree shape
That benchmark is a narrow tree (3 deep, 2 relations at its widest), which is close to the worst case for pipeloader. The gap widens with width, because:
- pipeloader round trips = tree depth — one burst per level, any width.
- batching round trips = Σ (distinct target tables per level) — each is its
own
INquery (a Dataloader source, or anincludespreload).
A wide query — issues fanning out to assignee, creator, project, parent, and
comments, those nesting to team, lead, and authors (example/bench_wide.rb):
| approach | RTT 0 | RTT 1 ms | RTT 5 ms | round-trips |
|---|---|---|---|---|
| naive (N+1) | 63 ms | 278 ms | 1115 ms | 164 |
AR includes |
13 ms | 29 ms | 91 ms | 11 |
GraphQL::Dataloader |
9 ms | 20 ms | 57 ms | 7 |
| pipeloader | 28 ms | 34 ms | 51 ms | 3 |
pipeloader's round trips stay at 3 (the depth) while batching climbs to 7–11, so at a 5 ms hop pipeloader is the fastest of all — the point-vs-batch crossover dropped from ~25 ms (narrow) to under 5 ms (wide). The wider and deeper the tree, the lower the RTT at which pipelining wins, because pipelining is the only one of the three whose round trips don't grow with the query.
How it works
use GraphQL::Dataloader— runs resolution in fibers. This is what lets a synchronous-lookingpost.authoryield instead of blocking, so sibling queries can gather before anything hits the wire.- A monkey-patch on
select_all— while a response is being built, AR's SELECT path hands the query to a Dataloader source instead of executing it. The active dataloader is stashed on the connection for the duration of the multiplex (and cleared at the end), so the patch finds it asself. - The source pipelines — when the fibers all park, it prepares each distinct
SQL once (cached per connection), then sends every gathered query as one libpq
burst (
enter_pipeline_mode…pipeline_sync), reads the results, and returns anActiveRecord::Resultper query so AR builds models normally.
Field-exact projection (opt-in)
By default AR picks the columns (SELECT *), which keeps adoption zero-effort. If
you want the pipeline to fetch only the columns the query selected, opt in and
pipeloader narrows each SELECT using graphql-ruby's lookahead:
Pipeloader.field_exact = true # globally, before your types load, or…
class Types::Post < GraphQL::Schema::Object
pipeloader_field_exact! # …per type
field :title, String, null: false
field :author, Types::Author, null: false
end
For { posts { title author { name } } } the posts SELECT becomes
SELECT id, title, author_id FROM … (PK + selected column + the FK needed to
resolve author), and the authors SELECT becomes SELECT id, name FROM … — for
both the root relation and the belongs_to.
It never breaks a field. A classifier narrows only when it can prove every
selected field reads a known column or association. The instant a selection is
opaque — a computed field, a custom resolver, anything it can't map to a column —
it bails to a whole-row fetch for that record, so a projected field can never
raise MissingAttributeError.
The selects: escape hatch. A computed field can declare the columns it
reads, so projection keeps them instead of bailing to a whole row:
field :excerpt, String, null: false, selects: %i[body]
def excerpt = object.body[0, 200]
Selecting excerpt now adds body to the projection. With no opt-in (the
default), selects: is accepted and ignored, and every SELECT is whole-row.
Status & caveats — this is a proof of concept
- Whole rows by default; field-exact is opt-in. Off, AR picks the columns (maximum transparency); on, the pipeline projects to the selected columns and bails to whole rows on anything opaque.
- PostgreSQL pipelines; SQLite narrows only; anything else raises. Pipelining
is libpq-specific, so on PostgreSQL queries are pipelined, on SQLite they run
un-pipelined (the opt-in column projection still applies, useful for tests/dev),
and any other adapter raises a
RuntimeErrorat query time rather than silently misbehaving. Running SQLite un-pipelined is safe because SQLite is embedded — its queries are in-process calls with no network round trip, so there's nothing for a dataloader or a pipeline to collapse. N+1 there is just a series of cheap local calls, not the latency amplification that makes N+1 catastrophic against a networked database. So pipelining buys nothing on SQLite, and skipping it costs nothing. - Reads only. It intercepts
select_all(SELECTs); writes and non-SELECTs fall straight through, and queries inside an open transaction are skipped. - Assumes thread-isolated connections (the ActiveRecord default): a request's
resolver fibers all share one connection. Under
:fiberisolation you'd stash per leased connection. - Stats are process-global single-threaded demo instrumentation.
- Prepares and caches statements per connection, but doesn't re-prepare after a
reconnect /
DEALLOCATEthe way AR does. Also not hardened for multiple databases,count/exists?(which route through other methods), or error recovery mid-pipeline.
Running the example
# Needs a Postgres DB with posts/authors/comments/users tables. In this repo:
# go run ./cmd/gqlbench -reset # seeds the graphql_experiment DB
ruby example/run.rb # shows the round-trip collapse
ruby example/bench.rb # the latency benchmark (narrow tree)
ruby example/bench_wide.rb # the wide-tree benchmark
Requires activerecord, graphql, and pg (libpq ≥ 14 for pipelining).
Tests
rake test. Three suites, all parity-first — the pipelined result must be
byte-identical to plain ActiveRecord:
test/pipeloader_test.rb— every query runs through both a plain schema and ause Pipeloaderschema, asserting identical results across each relationship kind, nullable foreign keys, empty has-many, deduplication, ordering, type casting, aliases, variables, and multiplex. It also checks round-trip counts (= tree depth) and that the patch leaves writes, transactions, and non-GraphQL ActiveRecord untouched.test/field_exact_test.rb— the opt-in projection: projected results match the whole-row schema, the emitted SQL is actually narrowed (and keeps the FK), theselects:escape hatch includes its columns, and opaque fields bail to a whole-rowSELECT *instead of raising.test/adapter_test.rb— adapter handling: PostgreSQL pipelines, an unsupported adapter raises, and a real in-memory SQLite run (in a subprocess) proves projection works there with pipelining disabled.
Needs a reachable Postgres (the suites create pl_* fixture tables in
graphql_experiment).