rlz4
Ractor-safe LZ4 bindings for Ruby, built as a Rust extension on top of
lz4_flex via magnus.
Why?
The existing Ruby LZ4 gems are broken under Ractor:
rlz4 marks the extension Ractor-safe at load time and uses only owned,
thread-safe state, so it can be called from any Ractor.
Install
# Gemfile
gem "rlz4"
Building requires a Rust toolchain (stable).
Usage
Frame format (module functions)
require "rlz4"
compressed = RLZ4.compress("hello world" * 100)
decompressed = RLZ4.decompress(compressed)
# Wire format is standard LZ4 frame (magic number 04 22 4D 18),
# interoperable with any other LZ4 frame implementation.
Invalid input raises RLZ4::DecompressError (a StandardError subclass):
begin
RLZ4.decompress("not a valid lz4 frame")
rescue RLZ4::DecompressError => e
warn e.
end
Dictionary compression
For workloads where many small messages share a common prefix (e.g. ZMQ
messages with a fixed header), a shared dictionary massively improves the
compression ratio. RLZ4::Dictionary#compress emits a real LZ4 frame
(magic 04 22 4D 18) with the FLG.DictID bit set and the dictionary's
Dict_ID written into the FrameDescriptor — interoperable with the
reference lz4 CLI given the same dictionary file (lz4 -d -D dict.bin).
dict = RLZ4::Dictionary.new("schema=v1 type=message field1=")
compressed = dict.compress("schema=v1 type=message field1=payload")
decompressed = dict.decompress(compressed)
dict.size # => 30
dict.id # => u32 Dict_ID
RLZ4::Dictionary is immutable after construction and can be shared across
Ractors.
Dictionary IDs
Dictionary#id is a u32 derived from sha256(dict_bytes)[0..4]
interpreted little-endian. The LZ4 frame spec defines Dict_ID as
an application-defined field with no reserved ranges and no central
registrar, so the full u32 space is usable.
The id is on the wire: Dictionary#compress sets FLG.DictID = 1
and writes the id into the FrameDescriptor. On decode, rlz4 parses
the incoming frame's Dict_ID and asserts it matches
Dictionary#id before touching the payload. Receivers that maintain
multiple dictionaries can therefore route incoming frames to the
right one purely by parsing the frame header — no out-of-band id
channel needed.
LZ4 dictionaries are always raw bytes (unlike Zstd, there is no
dict-file header format), so there is no header to parse an id out
of. If you need sender and receiver to agree on an id without
shipping it out-of-band, deriving it deterministically from the
dict bytes — which is what Dictionary.new does — is the simplest
option.
Dictionary training from a sample corpus is not supported: LZ4
has no equivalent of Zstd's ZDICT_trainFromBuffer. Dictionaries
are supplied by the caller as raw bytes (typically a hand-picked
prefix or a representative message).
Ractors
Both the module functions and RLZ4::Dictionary can be used from any
Ractor. Example from the test suite:
ractors = 4.times.map do |i|
Ractor.new(i) do |idx|
pt = "ractor #{idx} payload " * 1000
1000.times do
ct = RLZ4.compress(pt)
raise "mismatch" unless RLZ4.decompress(ct) == pt
end
:ok
end
end
ractors.map(&:value) # => [:ok, :ok, :ok, :ok]
Non-goals
- High-compression mode (LZ4_HC).
- Streaming / chunked compression.
- Preservation of string encoding on decompress (output is always binary).
License
MIT