pipeloader

Transparent libpq pipelining for graphql-ruby on ActiveRecord. During GraphQL response building, every ActiveRecord SELECT is routed through a libpq pipeline, so a nested query resolves in roughly one round trip per tree level — with plain resolvers and plain models. No Futures, no dataloader.load, no field changes.

Adopting it

One line:

class AppSchema < GraphQL::Schema
  use Pipeloader
end

That's the whole adoption surface. Your types and resolvers stay exactly as they are — ordinary ActiveRecord:

class Types::Post < GraphQL::Schema::Object
  field :title, String, null: false
  field :author, Types::Author, null: false       # resolves via post.author
  field :comments, [Types::Comment], null: false   # resolves via post.comments
end

post.author, post.comments, a has_many, a .where(...) in a hand-written resolver — any AR SELECT issued while building the response is intercepted and pipelined. Because it hooks AR's query path (not the GraphQL field), nothing leaks back to synchronous N+1, even from custom resolver code.

What it does

example/run.rb — plain resolvers, against a seeded database:

{ posts(limit: 50) { title author { name } comments { body commenter { name } } } }

resolved 50 posts with PLAIN AR resolvers
pipeline round-trips: 3
queries pipelined:    403
naive N+1 would be:   ~594 round trips

Three round trips: posts → (authors + comments) → commenters. The to-one author and the to-many comments are different shapes at the same level, yet collapse into a single round trip.

Benchmark

The same 3-level query (posts → author + comments → commenter, 25 posts), resolved four ways, with real network latency added by a local TCP proxy in front of Postgres (example/latency_proxy.rb delays the request direction, so a synchronous query pays RTT once and a pipelined burst pays it once). Min of 3 iterations; your numbers will vary.

approach RTT 0 RTT 1 ms RTT 5 ms round-trips
naive (N+1) 94 ms 505 ms 1972 ms 290
AR includes (hand-written) 17 ms 22 ms 42 ms 4
GraphQL::Dataloader 16 ms 21 ms 42 ms 4
pipeloader 41 ms 45 ms 73 ms 3

Reading it honestly:

  • vs the N+1 you actually have — the headline. pipeloader turns 290 round trips into 3 with zero resolver code, so at a 5 ms hop it's ~24× faster than naive. Most "there's an N+1 in here somewhere" code is the naive row.
  • vs batching (includes / GraphQL::Dataloader) — at low/moderate RTT, batching still wins: its 4 IN queries do less work than pipeloader's ~400 prepared point queries. pipeloader prepares + caches statements per connection (so parse cost is amortized to ~one parse per query shape), but it still runs 400 bind+executes and builds 400 results. Pipelining cuts round trips; batching cuts server work. pipeloader does fewer round trips (3 vs 4 — it collapses the to-one author and to-many comments into one burst, where Dataloader runs them as two sequential sources), so it closes the gap as RTT rises and passes the batchers around ~25 ms RTT (cross-region). Same point-vs-batch tradeoff the Go experiments in this repo show.
  • What pipeloader actually buys you: zero code, for any query shape. GraphQL::Dataloader needs a source plus a .load call per association; includes must be hand-written per query and kept in sync with the selection. pipeloader is use Pipeloader and ordinary resolvers.

Run it: ruby example/bench.rb (needs the seeded graphql_experiment DB).

Scaling with tree shape

That benchmark is a narrow tree (3 deep, 2 relations at its widest), which is close to the worst case for pipeloader. The gap widens with width, because:

  • pipeloader round trips = tree depth — one burst per level, any width.
  • batching round trips = Σ (distinct target tables per level) — each is its own IN query (a Dataloader source, or an includes preload).

A wide query — issues fanning out to assignee, creator, project, parent, and comments, those nesting to team, lead, and authors (example/bench_wide.rb):

approach RTT 0 RTT 1 ms RTT 5 ms round-trips
naive (N+1) 63 ms 278 ms 1115 ms 164
AR includes 13 ms 29 ms 91 ms 11
GraphQL::Dataloader 9 ms 20 ms 57 ms 7
pipeloader 28 ms 34 ms 51 ms 3

pipeloader's round trips stay at 3 (the depth) while batching climbs to 7–11, so at a 5 ms hop pipeloader is the fastest of all — the point-vs-batch crossover dropped from ~25 ms (narrow) to under 5 ms (wide). The wider and deeper the tree, the lower the RTT at which pipelining wins, because pipelining is the only one of the three whose round trips don't grow with the query.

How it works

  1. use GraphQL::Dataloader — runs resolution in fibers. This is what lets a synchronous-looking post.author yield instead of blocking, so sibling queries can gather before anything hits the wire.
  2. A monkey-patch on select_all — while a response is being built, AR's SELECT path hands the query to a Dataloader source instead of executing it. The active dataloader is stashed on the connection for the duration of the multiplex (and cleared at the end), so the patch finds it as self.
  3. The source pipelines — when the fibers all park, it prepares each distinct SQL once (cached per connection), then sends every gathered query as one libpq burst (enter_pipeline_modepipeline_sync), reads the results, and returns an ActiveRecord::Result per query so AR builds models normally.

Field-exact projection (opt-in)

By default AR picks the columns (SELECT *), which keeps adoption zero-effort. If you want the pipeline to fetch only the columns the query selected, opt in and pipeloader narrows each SELECT using graphql-ruby's lookahead:

Pipeloader.field_exact = true            # globally, before your types load, or…

class Types::Post < GraphQL::Schema::Object
  pipeloader_field_exact!                # …per type
  field :title, String, null: false
  field :author, Types::Author, null: false
end

For { posts { title author { name } } } the posts SELECT becomes SELECT id, title, author_id FROM … (PK + selected column + the FK needed to resolve author), and the authors SELECT becomes SELECT id, name FROM … — for both the root relation and the belongs_to.

It never breaks a field. A classifier narrows only when it can prove every selected field reads a known column or association. The instant a selection is opaque — a computed field, a custom resolver, anything it can't map to a column — it bails to a whole-row fetch for that record, so a projected field can never raise MissingAttributeError.

The selects: escape hatch. A computed field can declare the columns it reads, so projection keeps them instead of bailing to a whole row:

field :excerpt, String, null: false, selects: %i[body]
def excerpt = object.body[0, 200]

Selecting excerpt now adds body to the projection. With no opt-in (the default), selects: is accepted and ignored, and every SELECT is whole-row.

Status & caveats — this is a proof of concept

  • Whole rows by default; field-exact is opt-in. Off, AR picks the columns (maximum transparency); on, the pipeline projects to the selected columns and bails to whole rows on anything opaque.
  • PostgreSQL pipelines; SQLite narrows only; anything else raises. Pipelining is libpq-specific, so on PostgreSQL queries are pipelined, on SQLite they run un-pipelined (the opt-in column projection still applies, useful for tests/dev), and any other adapter raises a RuntimeError at query time rather than silently misbehaving. Running SQLite un-pipelined is safe because SQLite is embedded — its queries are in-process calls with no network round trip, so there's nothing for a dataloader or a pipeline to collapse. N+1 there is just a series of cheap local calls, not the latency amplification that makes N+1 catastrophic against a networked database. So pipelining buys nothing on SQLite, and skipping it costs nothing.
  • Reads only. It intercepts select_all (SELECTs); writes and non-SELECTs fall straight through, and queries inside an open transaction are skipped.
  • Assumes thread-isolated connections (the ActiveRecord default): a request's resolver fibers all share one connection. Under :fiber isolation you'd stash per leased connection.
  • Stats are process-global single-threaded demo instrumentation.
  • Prepares and caches statements per connection, but doesn't re-prepare after a reconnect / DEALLOCATE the way AR does. Also not hardened for multiple databases, count/exists? (which route through other methods), or error recovery mid-pipeline.

Running the example

# Needs a Postgres DB with posts/authors/comments/users tables. In this repo:
#   go run ./cmd/gqlbench -reset    # seeds the graphql_experiment DB
ruby example/run.rb        # shows the round-trip collapse
ruby example/bench.rb      # the latency benchmark (narrow tree)
ruby example/bench_wide.rb # the wide-tree benchmark

Requires activerecord, graphql, and pg (libpq ≥ 14 for pipelining).

Tests

rake test. Three suites, all parity-first — the pipelined result must be byte-identical to plain ActiveRecord:

  • test/pipeloader_test.rb — every query runs through both a plain schema and a use Pipeloader schema, asserting identical results across each relationship kind, nullable foreign keys, empty has-many, deduplication, ordering, type casting, aliases, variables, and multiplex. It also checks round-trip counts (= tree depth) and that the patch leaves writes, transactions, and non-GraphQL ActiveRecord untouched.
  • test/field_exact_test.rb — the opt-in projection: projected results match the whole-row schema, the emitted SQL is actually narrowed (and keeps the FK), the selects: escape hatch includes its columns, and opaque fields bail to a whole-row SELECT * instead of raising.
  • test/adapter_test.rb — adapter handling: PostgreSQL pipelines, an unsupported adapter raises, and a real in-memory SQLite run (in a subprocess) proves projection works there with pipelining disabled.

Needs a reachable Postgres (the suites create pl_* fixture tables in graphql_experiment).