Class: RZstd::Dictionary
- Inherits:
-
Data
- Object
- Data
- RZstd::Dictionary
- Defined in:
- lib/rzstd/dictionary.rb,
lib/rzstd/dictionary.rb
Overview
Pure value type for a Zstd dictionary: raw bytes plus a 4-byte id. Built on ‘Data.define`, so it’s immutable, gets ‘==` / `#hash` / `#deconstruct` for free, and is shareable across Ractors.
The id defaults to:
-
the ‘Dict_ID` field from the ZDICT header if the bytes begin with the ZDICT magic (`37 A4 30 EC`) — this matches the id zstd writes into every compressed frame header via `ZSTD_c_dictIDFlag`;
-
‘sha256(bytes)[0, 4]` interpreted little-endian, mapped into the public `32_768..(2**31 − 1)` range, for raw-content dictionaries (which carry a frame `Dict_ID` of 0 and therefore can’t use id-based mismatch detection).
Callers can override via ‘id:` (e.g. a value coordinated out of band).
Trained dictionaries are produced by ‘Dictionary.train(samples, capacity:)` and are ZDICT-format.
Constant Summary collapse
- ZDICT_MAGIC =
"\x37\xA4\x30\xEC".b.freeze
- USER_DICT_ID_MIN =
32_768- USER_DICT_ID_MAX =
(2**31) - 1
- USER_DICT_ID_SIZE =
USER_DICT_ID_MAX - USER_DICT_ID_MIN + 1
Instance Attribute Summary collapse
-
#bytes ⇒ Object
readonly
Returns the value of attribute bytes.
-
#id ⇒ Object
readonly
Returns the value of attribute id.
Class Method Summary collapse
-
.train(samples, capacity: 64 * 1024) ⇒ Dictionary
Trains a dictionary from a corpus of sample frames.
Instance Method Summary collapse
-
#initialize(bytes:, id: nil) ⇒ Dictionary
constructor
A new instance of Dictionary.
- #size ⇒ Object
Constructor Details
#initialize(bytes:, id: nil) ⇒ Dictionary
Returns a new instance of Dictionary.
32 33 34 35 36 37 38 39 40 41 |
# File 'lib/rzstd/dictionary.rb', line 32 def initialize(bytes:, id: nil) b = bytes.b id ||= if b.byteslice(0, 4) == ZDICT_MAGIC b.byteslice(4, 4).unpack1("V") else raw = Digest::SHA256.digest(b).byteslice(0, 4).unpack1("V") USER_DICT_ID_MIN + (raw % USER_DICT_ID_SIZE) end super(bytes: b.freeze, id: id) end |
Instance Attribute Details
#bytes ⇒ Object (readonly)
Returns the value of attribute bytes
23 24 25 |
# File 'lib/rzstd/dictionary.rb', line 23 def bytes @bytes end |
#id ⇒ Object (readonly)
Returns the value of attribute id
23 24 25 |
# File 'lib/rzstd/dictionary.rb', line 23 def id @id end |
Class Method Details
.train(samples, capacity: 64 * 1024) ⇒ Dictionary
Trains a dictionary from a corpus of sample frames. Wraps ‘ZDICT_trainFromBuffer`. Returns a fresh Dictionary value (ZDICT-format, with its own dict_id in the header).
ZDICT recommends roughly 100 KiB total samples and at least 10 samples; under-provisioned inputs raise.
59 60 61 62 63 64 65 |
# File 'lib/rzstd/dictionary.rb', line 59 def self.train(samples, capacity: 64 * 1024) sizes = samples.map(&:bytesize) buffer = String.new(capacity: sizes.sum, encoding: Encoding::BINARY) samples.each { |s| buffer << s.b } bytes = RZstd._native_train(buffer, sizes, Integer(capacity)) new(bytes: bytes) end |
Instance Method Details
#size ⇒ Object
44 45 46 |
# File 'lib/rzstd/dictionary.rb', line 44 def size bytes.bytesize end |