rlz4

Gem Version License: MIT Ruby Rust

Ractor-safe LZ4 bindings for Ruby, built as a Rust extension on top of lz4_flex via magnus.

Why?

The existing Ruby LZ4 gems are broken under Ractor:

rlz4 marks the extension Ractor-safe at load time and uses only owned, thread-safe state, so it can be called from any Ractor.

Install

# Gemfile
gem "rlz4"

Building requires a Rust toolchain (stable).

API

Three classes plus one utility module function:

Purpose Wire format
RLZ4::Dictionary Value type: dict bytes + 4-byte id
RLZ4::FrameCodec Optionally dict-bound frame codec LZ4 frame (04 22 4D 18), interoperable with lz4 CLI
RLZ4::BlockCodec Optionally dict-bound block codec, reusable scratch Raw LZ4 block, no framing
RLZ4.compress_bound(n) Worst-case output size for input size n

Invalid input on decompress raises RLZ4::DecompressError (a StandardError subclass).

RLZ4::Dictionary

Pure value type — just the dict bytes plus a 4-byte id. Built on Data.define, so it's immutable, has value equality, and is shareable across Ractors. The id defaults to sha256(bytes)[0, 4] interpreted little-endian (the derivation LZ4 frame FLG.DictID uses); override with id: if you need a coordinated value.

dict = RLZ4::Dictionary.new(bytes: "schema=v1 type=message field1=")
dict.bytes  # => "schema=v1..." frozen binary
dict.id     # => u32
dict.size   # => 30

# With a caller-supplied id (e.g. from an out-of-band protocol):
custom = RLZ4::Dictionary.new(bytes: raw, id: 0xDEAD_BEEF)

RLZ4::FrameCodec — frame-format LZ4

Emits a real LZ4 frame (magic 04 22 4D 18), interoperable with the lz4 CLI. With a dictionary, sets FLG.DictID and writes Dict_ID into the FrameDescriptor — a receiver routing by id can pick the right dict from a set purely by parsing the frame header.

Stateless (no scratch), so FrameCodec instances are shareable across Ractors.

codec = RLZ4::FrameCodec.new                           # no dict
codec = RLZ4::FrameCodec.new(dict: dict)               # Dictionary value
codec = RLZ4::FrameCodec.new(dict: "raw bytes here")   # String shortcut

ct = codec.compress("hello world" * 100)
pt = codec.decompress(ct)

codec.has_dict?  # => true / false
codec.id         # => u32 id when dict-bound, nil otherwise
codec.size       # => dict size when dict-bound, 0 otherwise

Dict id mismatch on decompress raises RLZ4::DecompressError before touching the payload — no silently corrupt output.

RLZ4::BlockCodec — block-format LZ4

For hot paths that compress many small messages and want to amortise allocation. Emits a raw LZ4 block — no frame header, no end-mark, no checksum. Not interoperable with the reference lz4 CLI; meant for callers who carry their own framing (e.g. ZMTP transports).

Wraps a reusable 16 KiB scratch hash table. With a dictionary, also carries a pristine dict-loaded table and restores it into the scratch via a single 16 KiB memcpy before each compress call — so dict initialisation is paid once at construction, not per call.

codec = RLZ4::BlockCodec.new                           # no dict
codec = RLZ4::BlockCodec.new(dict: dict)               # Dictionary value
codec = RLZ4::BlockCodec.new(dict: "raw bytes here")   # String shortcut

ct = codec.compress("hello world" * 100)
pt = codec.decompress(ct, decompressed_size: 1100)

#decompress requires decompressed_size: because raw LZ4 blocks carry no length prefix. The decoder refuses to write past that value even on crafted malformed input — raises RLZ4::DecompressError on any overrun.

Use RLZ4.compress_bound(n) to pre-size output buffers.

BlockCodec holds a RefCell internally and is thread-local — do not cross Ractor boundaries. Allocate one per Ractor. The block format has no on-wire Dict_ID field; a dict mismatch produces garbage plaintext (not an error). Detect at a higher layer (checksum, schema validation, etc.).

Ractor safety

Dictionary and FrameCodec can be used from any Ractor. Example:

ractors = 4.times.map do |i|
  Ractor.new(i) do |idx|
    codec = RLZ4::FrameCodec.new
    pt    = "ractor #{idx} payload " * 1000
    1000.times do
      ct = codec.compress(pt)
      raise "mismatch" unless codec.decompress(ct) == pt
    end
    :ok
  end
end
ractors.map(&:value) # => [:ok, :ok, :ok, :ok]

BlockCodec must not cross Ractor boundaries — allocate one per Ractor.

Non-goals

  • High-compression mode (LZ4_HC).
  • Streaming / chunked compression.
  • Preservation of string encoding on decompress (output is always binary).
  • Dictionary training from a sample corpus. LZ4 has no equivalent of Zstd's ZDICT_trainFromBuffer. Dictionaries are caller-supplied raw bytes.

License

MIT