rzstd
Ractor-safe Zstandard bindings for Ruby with persistent contexts.
rzstd provides a stateful FrameCodec that reuses ZSTD_CCtx /
ZSTD_DCtx state across calls instead of allocating a fresh ~256 KB
context every time, which is what makes it viable for small-message
workloads where the upstream zstd-ruby gem loses to LZ4 purely on
context-allocation overhead.
API
Two classes plus one utility module function — API shape mirrors
rlz4 0.4.x:
| Purpose | |
|---|---|
RZstd::Dictionary |
Value type: dict bytes + 4-byte id |
RZstd::FrameCodec |
Stateful frame-format codec, optional dict |
RZstd.get_frame_content_size(bytes) |
Header parse, no decode |
Errors:
RZstd::DecompressError < StandardError— malformed frame, wrong dict, checksum mismatch.RZstd::MissingContentSizeError < DecompressError—max_output_size:requested but the frame header omits Frame_Content_Size.RZstd::OutputSizeLimitError < DecompressError— frame's declared Frame_Content_Size exceeds the caller's limit.
RZstd::Dictionary
Pure value type — just dict bytes plus a 4-byte id. Built on
Data.define, so it's immutable, has value equality, and is
shareable across Ractors.
# Raw-content dict: id synthesised from sha256(bytes) mapped into
# the public 32_768..(2**31 - 1) range.
d = RZstd::Dictionary.new(bytes: "schema=v1 type=message field1=")
# ZDICT-format dict (produced by `zstd --train` or Dictionary.train):
# id is read from the header, matching what zstd writes into every
# compressed frame via FLG.DictID.
d = RZstd::Dictionary.new(bytes: File.binread("schema.dict"))
d.bytes # => frozen binary bytes
d.id # => u32
d.size # => dict size
# Override the id (e.g. from an out-of-band registrar):
d = RZstd::Dictionary.new(bytes: raw, id: 0xDEAD_BEEF)
Training
# ZDICT_trainFromBuffer: 100 KiB total samples and ≥ 10 samples
# recommended. Returns a ZDICT-format Dictionary.
samples = 1000.times.map { }
dict = RZstd::Dictionary.train(samples, capacity: 64 * 1024)
dict.bytes[0, 4] # => "\x37\xA4\x30\xEC" (ZDICT magic)
dict.id # => the id zstd put in the header; same as on the wire
Dictionary IDs — the long version
Dictionary#id follows the Zstandard spec's Dictionary_ID semantics:
- ZDICT-format dicts (the output of
Dictionary.train, or any bytes starting with the ZDICT magic0xEC30A437LE): the id is read straight out of header bytes[4..7]. This is the same id zstd writes into every compressed frame header viaZSTD_c_dictIDFlag(on by default), soDictionary#idand the on-wire frameDictionary_IDalways agree. Receivers can therefore route incoming frames to the right dictionary purely by parsing the frame header — no side channel required. - Raw-content dicts (opaque bytes with no ZDICT header): the spec
requires the on-wire frame
Dictionary_IDto be0, sorzstdsynthesises a local id fromsha256(bytes)mapped into the public range32_768..(2**31 - 1)— avoiding both reserved ranges (0..32_767, reserved for a future registrar, and>= 2**31). This id is useful as an in-process handle; it is not on the wire, so peers that need to agree on raw-content dicts must share them out-of-band.
Public constants RZstd::Dictionary::USER_DICT_ID_MIN /
USER_DICT_ID_MAX / USER_DICT_ID_SIZE expose the private range
for callers that generate their own ids.
RZstd::FrameCodec
Stateful frame-format codec. Holds a CCtx and a DCtx across calls,
avoiding the ~256 KB per-call allocation overhead that bites the
upstream zstd-ruby gem on small messages.
# No-dict codec, default level (3).
codec = RZstd::FrameCodec.new
ct = codec.compress("the quick brown fox" * 10)
pt = codec.decompress(ct)
# Explicit level (negative = Zstd's fast strategy):
codec = RZstd::FrameCodec.new(level: -3)
Dict-bound
Pass a Dictionary (or raw bytes as a shortcut):
codec = RZstd::FrameCodec.new(dict: dict, level: -3)
codec = RZstd::FrameCodec.new(dict: "bytes", level: -3)
codec.has_dict? # => true
codec.id # => u32 (the dict's id)
codec.level # => -3
codec.size # => dict size in bytes
Wrong-dict decoding is caught by the content checksum the encoder
enables — a peer using the wrong dictionary raises
RZstd::DecompressError instead of returning corrupt bytes.
Bounded decompression
# max_output_size: enforces an upper bound on the declared
# Frame_Content_Size before allocating the output buffer or
# invoking the decoder.
codec.decompress(bytes, max_output_size: 1_048_576)
Missing Frame_Content_Size when max_output_size: is set raises
MissingContentSizeError. Declared size over the limit raises
OutputSizeLimitError.
Frame header utility
RZstd.get_frame_content_size(bytes) # => Integer, or nil if header omits FCS
Useful for a receiver that wants to inspect a frame's declared size
before calling #decompress (e.g. for routing, accounting, or
pre-sizing).
Ractor safety
Module functions, Dictionary values, and FrameCodec instances are
all shareable across Ractors. FrameCodec serializes compress /
decompress calls on its internal Mutexes — for parallel throughput,
allocate one FrameCodec per Ractor.
ractors = 4.times.map do |i|
Ractor.new(i) do |idx|
codec = RZstd::FrameCodec.new
pt = "ractor #{idx} payload " * 1000
1000.times do
ct = codec.compress(pt)
raise "mismatch" unless codec.decompress(ct) == pt
end
:ok
end
end
ractors.map(&:value) # => [:ok, :ok, :ok, :ok]
Non-goals
- Streaming / chunked compression.
- Preservation of string encoding on decompress (output is always binary).
License
MIT