Class: Ucode::Repo::CodepointWriter
- Inherits:
-
Object
- Object
- Ucode::Repo::CodepointWriter
- Includes:
- AtomicWrites
- Defined in:
- lib/ucode/repo/codepoint_writer.rb
Overview
Writes one index.json per codepoint under output/blocks/<id>/<cp>/.
Streaming + threaded + idempotent:
- **Streaming**: callers pass an Enumerator; the writer pulls one
codepoint at a time, never the full 160k set in memory.
- **Threaded**: a fixed-size worker pool drains a shared queue.
Each codepoint maps to a unique path → no per-file contention.
- **Idempotent**: existing files are byte-compared to the new
payload before writing; identical content is a no-op. Safe to
re-run on the full dataset.
- **Atomic**: writes go to `<path>.tmp`, then rename. A crash
mid-write leaves either the old file or no file, never a
truncated one.
When a Glyphs::Resolver is supplied via resolver:, each
write also resolves the codepoint's glyph, writes glyph.svg
alongside index.json (same atomic + idempotent semantics), and
records the resolver tier + provenance on the codepoint's glyph
attribute so it lands in the serialized JSON. When resolver: is
nil (default), the writer is glyph-agnostic and only writes
index.json — preserving backward compatibility.
Instance Method Summary collapse
-
#initialize(output_root, parallel_workers: 8, resolver: nil, observer: nil) ⇒ CodepointWriter
constructor
A new instance of CodepointWriter.
-
#write(codepoint) ⇒ Pathname?
Write one codepoint synchronously.
-
#write_each(enum) ⇒ Integer
Drain an Enumerator through the worker pool.
Methods included from AtomicWrites
#same_content?, #to_pretty_json, #write_atomic
Constructor Details
#initialize(output_root, parallel_workers: 8, resolver: nil, observer: nil) ⇒ CodepointWriter
Returns a new instance of CodepointWriter.
51 52 53 54 55 56 57 |
# File 'lib/ucode/repo/codepoint_writer.rb', line 51 def initialize(output_root, parallel_workers: 8, resolver: nil, observer: nil) @output_root = Pathname.new(output_root) @parallel_workers = parallel_workers @resolver = resolver @observer = observer end |
Instance Method Details
#write(codepoint) ⇒ Pathname?
Write one codepoint synchronously.
63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/ucode/repo/codepoint_writer.rb', line 63 def write(codepoint) result = codepoint.block_id.nil? ? nil : resolve_glyph(codepoint) @observer&.call(codepoint, result) return nil if codepoint.block_id.nil? path = Paths.codepoint_json_path(@output_root, codepoint.block_id, codepoint.id) payload = serialize(codepoint) return nil unless write_atomic(path, payload) path end |
#write_each(enum) ⇒ Integer
Drain an Enumerator through the worker pool. Returns the total count of codepoints seen (whether or not each one was written).
79 80 81 82 83 |
# File 'lib/ucode/repo/codepoint_writer.rb', line 79 def write_each(enum) return drain_inline(enum) if @parallel_workers <= 1 drain_threaded(enum) end |