rlz4

Gem Version License: MIT Ruby Rust

Ractor-safe LZ4 bindings for Ruby, built as a Rust extension on top of lz4_flex via magnus.

Why?

The existing Ruby LZ4 gems are broken under Ractor:

rlz4 marks the extension Ractor-safe at load time and uses only owned, thread-safe state, so it can be called from any Ractor.

Install

# Gemfile
gem "rlz4"

Building requires a Rust toolchain (stable).

Usage

Frame format (module functions)

require "rlz4"

compressed   = RLZ4.compress("hello world" * 100)
decompressed = RLZ4.decompress(compressed)

# Wire format is standard LZ4 frame (magic number 04 22 4D 18),
# interoperable with any other LZ4 frame implementation.

Invalid input raises RLZ4::DecompressError (a StandardError subclass):

begin
  RLZ4.decompress("not a valid lz4 frame")
rescue RLZ4::DecompressError => e
  warn e.message
end

Dictionary compression

For workloads where many small messages share a common prefix (e.g. ZMQ messages with a fixed header), a shared dictionary massively improves the compression ratio. RLZ4::Dictionary#compress emits a real LZ4 frame (magic 04 22 4D 18) with the FLG.DictID bit set and the dictionary's Dict_ID written into the FrameDescriptor — interoperable with the reference lz4 CLI given the same dictionary file (lz4 -d -D dict.bin).

dict = RLZ4::Dictionary.new("schema=v1 type=message field1=")

compressed   = dict.compress("schema=v1 type=message field1=payload")
decompressed = dict.decompress(compressed)

dict.size  # => 30
dict.id    # => u32 Dict_ID

RLZ4::Dictionary is immutable after construction and can be shared across Ractors.

Dictionary IDs

Dictionary#id is a u32 derived from sha256(dict_bytes)[0..4] interpreted little-endian. The LZ4 frame spec defines Dict_ID as an application-defined field with no reserved ranges and no central registrar, so the full u32 space is usable.

The id is on the wire: Dictionary#compress sets FLG.DictID = 1 and writes the id into the FrameDescriptor. On decode, rlz4 parses the incoming frame's Dict_ID and asserts it matches Dictionary#id before touching the payload. Receivers that maintain multiple dictionaries can therefore route incoming frames to the right one purely by parsing the frame header — no out-of-band id channel needed.

LZ4 dictionaries are always raw bytes (unlike Zstd, there is no dict-file header format), so there is no header to parse an id out of. If you need sender and receiver to agree on an id without shipping it out-of-band, deriving it deterministically from the dict bytes — which is what Dictionary.new does — is the simplest option.

Dictionary training from a sample corpus is not supported: LZ4 has no equivalent of Zstd's ZDICT_trainFromBuffer. Dictionaries are supplied by the caller as raw bytes (typically a hand-picked prefix or a representative message).

Ractors

Both the module functions and RLZ4::Dictionary can be used from any Ractor. Example from the test suite:

ractors = 4.times.map do |i|
  Ractor.new(i) do |idx|
    pt = "ractor #{idx} payload " * 1000
    1000.times do
      ct = RLZ4.compress(pt)
      raise "mismatch" unless RLZ4.decompress(ct) == pt
    end
    :ok
  end
end
ractors.map(&:value) # => [:ok, :ok, :ok, :ok]

Non-goals

  • High-compression mode (LZ4_HC).
  • Streaming / chunked compression.
  • Preservation of string encoding on decompress (output is always binary).

License

MIT