pgoutput-parser
A high-performance, Ractor-safe PostgreSQL pgoutput logical replication protocol parser written in pure Ruby.
pgoutput-parser parses PostgreSQL logical replication CopyData payloads into immutable protocol message objects. It focuses on the pgoutput wire format: transaction boundaries, relation metadata, DML message structure, tuple payload markers, and raw tuple bytes.
It intentionally does not convert PostgreSQL values into application-specific Ruby objects. That belongs to a higher-level decoder layer, such as a future pgoutput-decoder gem.
Requirements
- Ruby 4+
- PostgreSQL 10+
Features
- Pure Ruby implementation
- Ruby 4+
- Ractor-safe parsed messages
- Immutable protocol message objects
- PostgreSQL logical replication protocol support
- Relation metadata tracking
- Binary-safe tuple parsing
- RBS type signatures included
- YARD documentation included
- No runtime dependencies
The generated documentation also includes a project glossary: docs/glossary.md.
Why Another pgoutput Library?
This gem focuses exclusively on protocol parsing.
It intentionally separates:
- Protocol parsing (
pgoutput-parser) - Type decoding (
pgoutput-decoder) - Replication transport/client management
This keeps the parser small, predictable, dependency-free, and faithful to PostgreSQL's wire format.
Supported MVP Scope
Supports the core non-streaming pgoutput logical replication messages:
- Begin (
B) - Message (
M) - Origin (
O) - Relation (
R) - Type (
Y) - Insert (
I) - Update (
U) - Delete (
D) - Truncate (
T) - Commit (
C)
The currently supported message formats are stable across PostgreSQL 10 through PostgreSQL 18.
TupleData supports all base column markers:
| Tuple Value Tag | Meaning |
|---|---|
n |
NULL |
u |
Unchanged TOAST value |
t |
Text-formatted raw value |
b |
Binary-formatted raw value |
Planned Support
Future releases may add support for:
- Stream Start (
S) - Stream Stop (
E) - Stream Commit (
c) - Stream Abort (
A) - Two-Phase Commit messages
What This Gem Does
PostgreSQL CopyData payload
│
▼
pgoutput-parser
│
▼
Immutable protocol messages
The parser understands:
- Message tags and binary field sizes
- Transaction begin metadata
- Transaction commit metadata
- Relation metadata
- Column flags
- Column names
- PostgreSQL type OIDs
- PostgreSQL type modifiers
- Insert tuples
- Update old-key tuples
- Update old full tuples
- Update new tuples
- Delete old-key tuples
- Delete old full tuples
- Tuple value markers (
n,u,t,b)
What This Gem Does Not Do
The parser does not perform application-level type decoding.
It does not convert:
- UUID
- JSONB
- Timestamp
- Numeric
- Array
- Range
- PostGIS
- Custom PostgreSQL types
Example:
value.raw
# => "2026-05-31 12:34:56+00"
The raw value is preserved exactly as received.
A higher-level decoder may later interpret it.
Non-goals
This project intentionally does not:
- Manage replication slots
- Open replication connections
- Maintain WAL positions
- Reconnect to PostgreSQL
- Decode PostgreSQL types
- Integrate with ActiveRecord
- Publish events
- Build CDC pipelines
Its sole responsibility is parsing pgoutput protocol messages.
Installation
Add this line to your Gemfile:
gem "pgoutput-parser"
Then run:
bundle install
Require the library:
require "pgoutput"
Quick Start
require "pgoutput"
stream = Pgoutput::RelationTracker.new
stream.process(relation_payload)
insert = stream.process(insert_payload)
insert.relation_id
# => 42
insert.tuple.first.raw
# => "7"
insert.tuple.first.oid
# => 23
Binary Tuple Values
When PostgreSQL publishes tuple values using binary format (b), the parser preserves the raw bytes exactly as received.
value.raw
# => "\x00\x00\x00\x07".b
The parser does not interpret binary values.
Update Messages
PostgreSQL Update messages may contain:
- No old tuple
- An old key tuple (
K) - An old full tuple (
O)
They always contain a new tuple (N).
update = stream.process(update_payload)
update.old_key_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil
update.old_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil
update.new_tuple
# => [Pgoutput::Messages::TupleValue, ...]
Update Tuple Example
update = stream.process(update_payload)
update.old_key_tuple
update.old_tuple
update.new_tuple
Delete Messages
PostgreSQL Delete messages contain either:
- An old key tuple (
K) - An old full tuple (
O)
delete = stream.process(delete_payload)
delete.old_key_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil
delete.old_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil
Relation Metadata Tracking
RelationTracker keeps a local relation cache so tuple values can be associated with PostgreSQL column OIDs defined by preceding Relation (R) messages.
The tracker accepts an optional relation_cache: argument. The default is a
plain Hash, but callers can inject Ratomic::Map for a Ractor-safe cache in
experimental or parallel setups.
For a deeper guide, including stream-order behavior, tuple arity validation, and
Ratomic::Map usage, see docs/relation_tracker.md.
No type conversion is performed.
Only protocol metadata is attached.
stream.process(relation_payload)
= stream.process(update_payload)
.new_tuple.map(&:oid)
# => [23, 25, 16]
The relation tracker itself is stateful and maintains relation metadata encountered in the replication stream.
If a DML tuple's value count does not match the cached relation column count, RelationTracker raises
Pgoutput::TupleArityError. This keeps malformed payloads or mismatched stream state from being silently
annotated with incomplete column metadata.
Ractor Safety
= stream.process(update_payload)
Ractor.shareable?()
# => true
Passing parsed messages to a Ractor:
= stream.process(update_payload)
result = Ractor.new() do |update|
update.new_tuple.map(&:raw)
end.take
Architecture
PostgreSQL
│
▼
CopyData payload
│
▼
Pgoutput::BinaryParser
│
▼
Parsed protocol message
│
▼
Pgoutput::RelationTracker
│
▼
Protocol message with relation metadata
│
▼
Ractor-safe protocol message
Public API
Pgoutput::BinaryParser
Parses a single pgoutput payload without stream state.
= Pgoutput::BinaryParser.new(payload).parse
Pgoutput::RelationTracker
Parses messages in stream order and remembers relation metadata.
stream = Pgoutput::RelationTracker.new
stream.process(relation_payload)
= stream.process(insert_payload)
Optional Usage
RelationTracker is optional.
If relation metadata tracking is not required, payloads can be parsed directly:
=
Pgoutput::BinaryParser
.new(payload)
.parse
RelationTracker Lifecycle
A RelationTracker should be created per logical replication stream.
stream = Pgoutput::RelationTracker.new
The tracker maintains relation metadata discovered during the stream and therefore should not be reused across unrelated replication sessions.
Message Classes
Pgoutput::Messages::Begin
Pgoutput::Messages::Message
Pgoutput::Messages::Origin
Pgoutput::Messages::Relation
Pgoutput::Messages::Type
Pgoutput::Messages::Column
Pgoutput::Messages::TupleValue
Pgoutput::Messages::Insert
Pgoutput::Messages::Update
Pgoutput::Messages::Delete
Pgoutput::Messages::Truncate
Pgoutput::Messages::Commit
Type Signatures
RBS signatures are included:
sig/pgoutput.rbs
Run Steep:
bundle exec steep check
Testing
Run all tests:
bundle exec rake test
Run with coverage:
COVERAGE=true bundle exec rake test
Benchmarking
Run the parser throughput benchmark:
ruby benchmark/parser_throughput.rb
The benchmark reports single-process parser throughput, relation-tracker throughput, and Ractor-parallel throughput. It is intended to show both the single-thread baseline and the Ruby 4 Ractor path this parser is designed to support. Relation-tracker scenarios can also compare the default Hash relation cache with an optional Ratomic::Map cache.
Tune the run with environment variables:
| Variable | Default | Description |
|---|---|---|
PGOUTPUT_BENCH_ITERATIONS |
100000 |
Iterations per selected scenario. |
PGOUTPUT_BENCH_WARMUP |
1000 |
Warmup iterations before timing. |
PGOUTPUT_BENCH_RACTORS |
2 or CPU count, whichever is lower |
Number of Ractor workers for Ractor scenarios. |
PGOUTPUT_BENCH_SCENARIOS |
all |
Comma-separated scenarios: binary, tracker_dml, tracker_mixed, ractor_binary, ractor_tracker, or all. |
PGOUTPUT_BENCH_RELATION_CACHE |
hash |
Comma-separated relation-cache backends for tracker scenarios: hash, ratomic, or all. |
Examples:
PGOUTPUT_BENCH_ITERATIONS=10000 ruby benchmark/parser_throughput.rb
PGOUTPUT_BENCH_SCENARIOS=binary,tracker_mixed ruby benchmark/parser_throughput.rb
PGOUTPUT_BENCH_RACTORS=4 PGOUTPUT_BENCH_SCENARIOS=ractor_binary,ractor_tracker ruby benchmark/parser_throughput.rb
PGOUTPUT_BENCH_RELATION_CACHE=all PGOUTPUT_BENCH_SCENARIOS=tracker_mixed,ractor_tracker ruby benchmark/parser_throughput.rb
Sample Ruby 4 output:
pgoutput-parser throughput
iterations=1000 warmup=10 ractors=2 scenarios=tracker_mixed,ractor_tracker relation_cache=hash,ratomic ruby=4.0.5
RelationTracker hash 7000 messages in 0.163s 42891 msg/s
RelationTracker ratomic 7000 messages in 0.131s 53579 msg/s
Ractor RelationTracker hash 14000 messages in 0.197s 71097 msg/s
Ractor RelationTracker ratomic 14000 messages in 0.146s 96190 msg/s
Interpret the Ractor rows as aggregate throughput across workers. They are not a replacement for the single-process rows; they demonstrate the parser's shareable-message design under parallel execution.
Development
Generate YARD documentation:
bundle exec yard doc
Ecosystem Direction
This gem is the protocol layer.
pgoutput-parser
│
▼
Protocol messages
│
▼
pgoutput-decoder
│
▼
Application objects
pgoutput-parser should remain small, dependency-free, binary-safe, and faithful to PostgreSQL's wire format.
License
MIT.