Class: RZstd::Dictionary
- Inherits:
-
Object
- Object
- RZstd::Dictionary
- Defined in:
- lib/rzstd.rb
Constant Summary collapse
- ZDICT_MAGIC =
"\x37\xA4\x30\xEC".b.freeze
- USER_DICT_ID_MIN =
Public Dict_ID range per the Zstandard spec. IDs ‘0..32_767` are reserved for a future registrar, and `>= 2**31` is reserved. Only `32_768..(2**31 - 1)` is available for private/auto-generated dicts.
32_768- USER_DICT_ID_MAX =
(2**31) - 1
- USER_DICT_ID_SIZE =
USER_DICT_ID_MAX - USER_DICT_ID_MIN + 1
Class Method Summary collapse
-
.new(bytes, level: DEFAULT_LEVEL) ⇒ Object
Public constructor.
-
.train(samples, capacity: 64 * 1024) ⇒ String
Trains a raw-content dictionary from a corpus of sample frames.
Instance Method Summary collapse
Class Method Details
.new(bytes, level: DEFAULT_LEVEL) ⇒ Object
Public constructor. Resolves the Zstd ‘Dict_ID`:
-
If ‘bytes` begins with the ZDICT magic (`0x EC30A437` LE), the id is read from bytes `[4..7]` of the dictionary header. This is the same id zstd writes into every compressed frame header via `ZSTD_c_dictIDFlag` (enabled by default), so on-wire frames and `Dictionary#id` agree.
-
Otherwise the dict is raw content: zstd writes a frame ‘dictID` of 0, and this wrapper falls back to `sha256(bytes)` LE mapped into the public range `32_768..(2**31 - 1)`, purely as an out-of-band identifier for the Ruby side. Wrong-dict decoding of raw dicts is caught by the content checksum the encoder enables.
54 55 56 57 58 59 60 61 62 63 |
# File 'lib/rzstd.rb', line 54 def self.new(bytes, level: DEFAULT_LEVEL) id = if bytes.byteslice(0, 4) == ZDICT_MAGIC bytes.byteslice(4, 4).unpack1("V") else raw = Digest::SHA256.digest(bytes).byteslice(0, 4).unpack1("V") USER_DICT_ID_MIN + (raw % USER_DICT_ID_SIZE) end _native_new(bytes, id, Integer(level)) end |
.train(samples, capacity: 64 * 1024) ⇒ String
Trains a raw-content dictionary from a corpus of sample frames. Wraps ‘ZDICT_trainFromBuffer`. Returns the trained dictionary as a binary String, ready to feed back into `Dictionary.new`.
ZDICT recommends roughly 100 KiB total samples and at least 10 samples; under-provisioned inputs raise.
76 77 78 79 80 81 |
# File 'lib/rzstd.rb', line 76 def self.train(samples, capacity: 64 * 1024) sizes = samples.map { |s| s.bytesize } buffer = String.new(capacity: sizes.sum, encoding: Encoding::BINARY) samples.each { |s| buffer << s.b } _native_train(buffer, sizes, Integer(capacity)) end |
Instance Method Details
#decompress(bytes, max_output_size: nil) ⇒ Object
84 85 86 |
# File 'lib/rzstd.rb', line 84 def decompress(bytes, max_output_size: nil) _native_decompress(bytes, Integer(max_output_size || 0)) end |