Class: Ucode::Repo::CodepointWriter

Inherits:
Object
  • Object
show all
Includes:
AtomicWrites
Defined in:
lib/ucode/repo/codepoint_writer.rb

Overview

Writes one index.json per codepoint under output/blocks/<id>/<cp>/.

Streaming + threaded + idempotent:

- **Streaming**: callers pass an Enumerator; the writer pulls one
codepoint at a time, never the full 160k set in memory.
- **Threaded**: a fixed-size worker pool drains a shared queue.
Each codepoint maps to a unique path → no per-file contention.
- **Idempotent**: existing files are byte-compared to the new
payload before writing; identical content is a no-op. Safe to
re-run on the full dataset.
- **Atomic**: writes go to `<path>.tmp`, then rename. A crash
mid-write leaves either the old file or no file, never a
truncated one.

When a Glyphs::Resolver is supplied via resolver:, each write also resolves the codepoint's glyph, writes glyph.svg alongside index.json (same atomic + idempotent semantics), and records the resolver tier + provenance on the codepoint's glyph attribute so it lands in the serialized JSON. When resolver: is nil (default), the writer is glyph-agnostic and only writes index.json — preserving backward compatibility.

Instance Method Summary collapse

Methods included from AtomicWrites

#same_content?, #to_pretty_json, #write_atomic

Constructor Details

#initialize(output_root, parallel_workers: 8, resolver: nil, observer: nil) ⇒ CodepointWriter

Returns a new instance of CodepointWriter.

Parameters:

  • output_root (String, Pathname)
  • parallel_workers (Integer) (defaults to: 8)

    size of the worker pool. Set to 1 (or less) to run synchronously — useful in tests.

  • resolver (Ucode::Glyphs::Resolver, nil) (defaults to: nil)

    when non-nil, each write resolves the codepoint's glyph via this resolver and writes glyph.svg next to index.json. Sources inside the resolver must be safe for concurrent access — the worker pool calls into them from multiple threads.

  • observer (#call, nil) (defaults to: nil)

    when non-nil, invoked as observer.call(codepoint, result) after each resolve attempt (and before the JSON write). result is the Glyphs::Source::Result when a tier produced a glyph, or nil when no resolver is configured / no tier matched. Used by BuildReportAccumulator to tally per-tier stats. The observer must be thread-safe.



51
52
53
54
55
56
57
# File 'lib/ucode/repo/codepoint_writer.rb', line 51

def initialize(output_root, parallel_workers: 8, resolver: nil,
               observer: nil)
  @output_root = Pathname.new(output_root)
  @parallel_workers = parallel_workers
  @resolver = resolver
  @observer = observer
end

Instance Method Details

#write(codepoint) ⇒ Pathname?

Write one codepoint synchronously.

Parameters:

Returns:

  • (Pathname, nil)

    the path written, or nil if skipped (missing block_id or content-identical to existing file)



63
64
65
66
67
68
69
70
71
72
73
# File 'lib/ucode/repo/codepoint_writer.rb', line 63

def write(codepoint)
  result = codepoint.block_id.nil? ? nil : resolve_glyph(codepoint)
  @observer&.call(codepoint, result)
  return nil if codepoint.block_id.nil?

  path = Paths.codepoint_json_path(@output_root, codepoint.block_id, codepoint.id)
  payload = serialize(codepoint)
  return nil unless write_atomic(path, payload)

  path
end

#write_each(enum) ⇒ Integer

Drain an Enumerator through the worker pool. Returns the total count of codepoints seen (whether or not each one was written).

Parameters:

Returns:

  • (Integer)


79
80
81
82
83
# File 'lib/ucode/repo/codepoint_writer.rb', line 79

def write_each(enum)
  return drain_inline(enum) if @parallel_workers <= 1

  drain_threaded(enum)
end