omq-lz4
Status: 0.1.0 — first landable release. See RFC.md for the wire-format spec and CHANGELOG.md for what's in.
LZ4-compressed TCP transport for OMQ,
complementary to omq-zstd.
Pick lz4+tcp:// instead of tcp:// or zstd+tcp:// when you want
cheap per-message compression with a small per-connection footprint.
When to pick lz4+tcp:// over zstd+tcp://
LZ4 has no entropy stage (no Huffman, no FSE), ~16 KiB of encoder state per connection, and trades a worse compression ratio for ~4–8× faster encode and ~3× less memory per connection.
zstd+tcp:// |
lz4+tcp:// |
|
|---|---|---|
| Encode, 1 KiB, no dict | ~3 µs | ~0.4 µs |
| Encode, 1 KiB, with dict | ~3.5 µs | ~0.5 µs |
| Memory per connection | ~256 KiB | ~16 KiB + dict |
| Ratio, 1 KiB JSON no dict | ~45% | ~65% |
| Ratio, 1 KiB JSON with dict | ~20% | ~35% |
| Auto-trained dictionaries | ✓ | — (user-supplied only) |
Pick omq-lz4 for CPU- or memory-scarce deployments (edge gateways,
IoT concentrators, high-fanout scenarios where per-connection state
matters more than ratio). Pick omq-zstd for bandwidth-bound
deployments where CPU is cheap.
Install
# Gemfile
gem "omq-lz4"
gem install omq-lz4
Usage
require "omq"
require "omq/lz4"
pull = OMQ::PULL.new
push = OMQ::PUSH.new
uri = pull.bind("lz4+tcp://127.0.0.1:0")
push.connect(uri.to_s)
push << ["hello, compressed world"]
pull.receive # => ["hello, compressed world"]
Both peers must use lz4+tcp://. A tcp:// peer cannot talk to an
lz4+tcp:// peer — they speak different transports.
Dictionary compression
Small messages don't compress well on their own. A shared dictionary
gives 2–5× better ratios on payloads with a common prefix. Supply a
user-trained dictionary (LZ4 has no auto-training — use omq-zstd
for that):
dict = File.binread("schema.dict")
push.connect("lz4+tcp://127.0.0.1:5555", dict: dict)
The sender ships the dictionary to the receiver in-band, prefixed
with the dictionary sentinel (4C 5A 34 44, "LZ4D" in ASCII), on
the first outgoing message. The receiver installs the dictionary
and decompresses subsequent messages against it. Dictionary size
is capped at 8 KiB — tighter than omq-zstd's 64 KiB cap, to
let constrained peers accept shipments without allocating tens of
KB of scratch.
Compression thresholds
To avoid pessimizing tiny frames, the sender skips compression below:
| Mode | Threshold |
|---|---|
| No dictionary | 512 B |
| With dictionary | 32 B |
Below the threshold the part is sent uncompressed (4-byte zero sentinel + plaintext).
Security limits
The receiver bounds decompression by the socket's max_message_size
(the same knob you'd use on a plain tcp:// socket). It caps the
total decompressed size of all parts in a single message. A peer
attempting to send an over-budget message drops the connection —
OMQ::SocketDeadError surfaces on the next receive.
Independent of that, the dictionary itself is capped at 8 KiB; a larger shipment drops the connection.
See the plan roadmap (../OMQ-LZ4.plan) for history and open questions.
Performance
Measured on x86_64 scalar, Ruby 4.0 + YJIT, on dict-friendly (repeated Lorem ipsum prefix) input.
OMQ::LZ4::Codec (pure encode/decode, no I/O):
| Input size | No dict encode | Dict encode | No dict decode | Dict decode |
|---|---|---|---|---|
| 64 B | ~0.9 µs | ~1.0 µs | ~0.4 µs | ~0.6 µs |
| 256 B | ~1.1 µs | ~0.8 µs | ~0.4 µs | ~0.5 µs |
| 1 KiB | ~1.5 µs | ~0.9 µs | ~0.9 µs | ~1.0 µs |
| 16 KiB | ~3.2 µs | ~2.4 µs | ~3.9 µs | ~3.0 µs |
| 1 MiB | ~89 µs | ~87 µs | ~173 µs | ~303 µs |
End-to-end PUSH → PULL over lz4+tcp:// (loopback):
| Message size | Throughput |
|---|---|
| 64 B | ~67k msg/s |
| 256 B | ~94k msg/s |
| 1 KiB | ~92k msg/s |
Run the benchmarks yourself:
OMQ_DEV=1 bundle exec ruby --yjit bench/codec_micro.rb
OMQ_DEV=1 bundle exec ruby --yjit bench/transport_throughput.rb
OMQ_DEV=1 bundle exec ruby --yjit bench/head_to_head.rb # lz4 vs zstd
Head-to-head vs omq-zstd and plain tcp
End-to-end PUSH → PULL throughput, Ruby 4.0 + YJIT. Input: UUID-sprinkled Lorem ipsum — a fresh UUID between each Lorem paragraph. Approximates realistic workloads where a schema repeats but values vary (event logs, protobuf records, JSON events), so a fraction of every message is mandatorily incompressible.
The link between PUSH and PULL is loopback, rate-shaped with
tc netem rate Xmbit on dev lo to simulate bandwidth-limited
networks. zstd+tcp shown at level -3 (default, fast) and
level 3 (tighter ratio, more CPU).
The table below: plaintext MiB/s (application-level throughput) and wire MiB/s (bytes on the socket) at 128 KiB payload, across three bandwidth regimes.
| Link | Metric | tcp | lz4+tcp | zstd -3 | zstd 3 |
|---|---|---|---|---|---|
| 100 Mbit | plain | 11.8 | 105 | 114 | 197 |
| (cap ≈ 12 MiB/s) | wire | 11.8 | 12 | 12 | 12 |
| speedup | 1.00× | 8.89× | 9.70× | 16.74× | |
| 1 Gbit | plain | 117 | 794 | 900 | 603 |
| (cap ≈ 125 MiB/s) | wire | 117 | 93 | 94 | 36 |
| speedup | 1.00× | 6.81× | 7.73× | 5.17× | |
| Unlimited loopback | plain | 1 064 | 869 | 972 | 626 |
| (kernel-copy-bound) | wire | 1 064 | 99 | 101 | 37 |
| speedup | 1.00× | 0.82× | 0.91× | 0.59× |
Three regimes visible:
- 100 Mbit — all compressed transports saturate wire at
~12 MiB/s. Plaintext = wire-cap × (1 / compression-ratio). The
tighter the ratio, the bigger the win:
zstd 3's 3% wire ratio translates to a ~17× throughput multiplier over plain tcp. - 1 Gbit — compressed transports shift from wire-saturated to
CPU-limited.
zstd -3reaches ~75% of wire cap;zstd 3only 29% (deep CPU-bound). Both beat plain tcp (which is pinned at the wire cap) by 6–8×.zstd 3's tighter wire no longer helps — there's no wire saturation to trade CPU for. - Unlimited loopback — no wire cap. All three are CPU-limited. Plain tcp doesn't pay compression CPU, so skip compression on loopback.
Rate-shape your own link to reproduce:
sudo tc qdisc add dev lo root netem rate 100mbit # or 1gbit, 10mbit, etc.
OMQ_DEV=1 bundle exec ruby --yjit bench/head_to_head.rb
sudo tc qdisc del dev lo root
Or use a veth pair in a network namespace so shaping doesn't
touch your host's real loopback (see tc-netem(8), ip-netns(8)).
Full sweeps (8 sizes from 256 B to 512 KiB) for each regime live
in bench/head_to_head.rb output — run it yourself; the
headline numbers above are stable across repeats but small sizes
and very large sizes vary a bit run-to-run.
Takeaway:
- Pick
lz4+tcp://for bandwidth-limited links (any real network — even 1 Gbit LAN). 6–9× throughput multiplier over plaintcp, minimal memory (~16 KiB/connection), modest CPU. Ties or beatszstd -3at 1 Gbit; loses the ratio race tozstd 3at 100 Mbit and below. - Pick
zstd+tcp://(level ≥ 3) when the wire is the precious resource (≤ 100 Mbit links, WAN, or you're paying for egress). ~17× throughput multiplier at 100 Mbit for 128 KiB messages is hard to argue with. - Pick plain
tcp://when the link is not the bottleneck (localhost IPC, loopback, datacenter-fast inter-host connections where the bandwidth ceiling is above the CPU's compress/decompress speed — typically 10+ Gbit), or when the payload is already high-entropy (encrypted, already compressed, random binary) and compression only adds overhead.
Development
OMQ_DEV=1 bundle install
OMQ_DEV=1 bundle exec rake test