Class: Ucode::Repo::CodepointWriter
- Inherits:
-
Object
- Object
- Ucode::Repo::CodepointWriter
- Includes:
- AtomicWrites
- Defined in:
- lib/ucode/repo/codepoint_writer.rb
Overview
Writes one ‘index.json` per codepoint under `output/blocks/<id>/<cp>/`.
Streaming + threaded + idempotent:
- **Streaming**: callers pass an Enumerator; the writer pulls one
codepoint at a time, never the full 160k set in memory.
- **Threaded**: a fixed-size worker pool drains a shared queue.
Each codepoint maps to a unique path → no per-file contention.
- **Idempotent**: existing files are byte-compared to the new
payload before writing; identical content is a no-op. Safe to
re-run on the full dataset.
- **Atomic**: writes go to `<path>.tmp`, then rename. A crash
mid-write leaves either the old file or no file, never a
truncated one.
Instance Method Summary collapse
-
#initialize(output_root, parallel_workers: 8) ⇒ CodepointWriter
constructor
A new instance of CodepointWriter.
-
#write(codepoint) ⇒ Pathname?
Write one codepoint synchronously.
-
#write_each(enum) ⇒ Integer
Drain an Enumerator through the worker pool.
Methods included from AtomicWrites
#same_content?, #to_pretty_json, #write_atomic
Constructor Details
#initialize(output_root, parallel_workers: 8) ⇒ CodepointWriter
Returns a new instance of CodepointWriter.
31 32 33 34 |
# File 'lib/ucode/repo/codepoint_writer.rb', line 31 def initialize(output_root, parallel_workers: 8) @output_root = Pathname.new(output_root) @parallel_workers = parallel_workers end |
Instance Method Details
#write(codepoint) ⇒ Pathname?
Write one codepoint synchronously.
40 41 42 43 44 45 46 47 48 |
# File 'lib/ucode/repo/codepoint_writer.rb', line 40 def write(codepoint) return nil if codepoint.block_id.nil? path = Paths.codepoint_json_path(@output_root, codepoint.block_id, codepoint.id) payload = serialize(codepoint) return nil unless write_atomic(path, payload) path end |
#write_each(enum) ⇒ Integer
Drain an Enumerator through the worker pool. Returns the total count of codepoints seen (whether or not each one was written).
54 55 56 57 58 |
# File 'lib/ucode/repo/codepoint_writer.rb', line 54 def write_each(enum) return drain_inline(enum) if @parallel_workers <= 1 drain_threaded(enum) end |