Class: Ucode::Coordinator
- Inherits:
-
Object
- Object
- Ucode::Coordinator
- Defined in:
- lib/ucode/coordinator.rb,
lib/ucode/coordinator/indices.rb
Overview
Orchestrates the UCD + Unihan parsers and produces per-codepoint CodePoint records for a downstream sink (a writer, an aggregator, a database builder).
**Streaming architecture**:
1. Indices pass — load every range/point file into memory, keyed
by codepoint (hash) or sorted by `range_first` (bsearch).
Peak memory is ~10 MB of indices, NOT 160 k CodePoints.
2. Stream pass — `UnicodeData.each_record` drives the main loop.
For each yielded CodePoint, the Coordinator merges in data from
the indices, then yields to the sink. CodePoints are GC'd
after the sink processes them.
Every data file is OPTIONAL — if a file is missing (partial fetch, incremental run), the corresponding indices stay empty and the matching CodePoint fields stay at their defaults. This makes the Coordinator resilient against partial fixtures and lets users run subsets.
Defined Under Namespace
Classes: Indices
Instance Attribute Summary collapse
-
#config ⇒ Object
readonly
Returns the value of attribute config.
Instance Method Summary collapse
-
#build(ucd_dir:, unihan_dir:, &block) ⇒ Object
Stream-driven build.
-
#each_codepoint(ucd_dir:, unihan_dir:) ⇒ Object
Iterates one enriched CodePoint per assigned codepoint.
-
#each_codepoint_with_indices(ucd_dir:, unihan_dir:) ⇒ Object
Like #each_codepoint but yields ‘(indices, cp)` so callers that need the indices for a post-stream flush (e.g. ParseCommand) can reuse them instead of re-building.
-
#indices_for(ucd_dir:, unihan_dir:) ⇒ Object
Build (and return) the Coordinator::Indices for the given UCD + Unihan dirs.
-
#initialize(config = Ucode.configuration) ⇒ Coordinator
constructor
A new instance of Coordinator.
Constructor Details
#initialize(config = Ucode.configuration) ⇒ Coordinator
Returns a new instance of Coordinator.
36 37 38 |
# File 'lib/ucode/coordinator.rb', line 36 def initialize(config = Ucode.configuration) @config = config end |
Instance Attribute Details
#config ⇒ Object (readonly)
Returns the value of attribute config.
34 35 36 |
# File 'lib/ucode/coordinator.rb', line 34 def config @config end |
Instance Method Details
#build(ucd_dir:, unihan_dir:, &block) ⇒ Object
Stream-driven build. Calls ‘block` once per assigned codepoint.
41 42 43 |
# File 'lib/ucode/coordinator.rb', line 41 def build(ucd_dir:, unihan_dir:, &block) each_codepoint(ucd_dir: ucd_dir, unihan_dir: unihan_dir, &block) end |
#each_codepoint(ucd_dir:, unihan_dir:) ⇒ Object
Iterates one enriched CodePoint per assigned codepoint. Returns a lazy Enumerator when called without a block.
47 48 49 50 51 52 53 54 55 56 |
# File 'lib/ucode/coordinator.rb', line 47 def each_codepoint(ucd_dir:, unihan_dir:) return enum_for(:each_codepoint, ucd_dir: ucd_dir, unihan_dir: unihan_dir) unless block_given? indices = build_indices(ucd_dir, unihan_dir) each_with_indices(ucd_dir: ucd_dir, unihan_dir: unihan_dir, indices: indices) do |cp| yield cp end nil end |
#each_codepoint_with_indices(ucd_dir:, unihan_dir:) ⇒ Object
Like #each_codepoint but yields ‘(indices, cp)` so callers that need the indices for a post-stream flush (e.g. ParseCommand) can reuse them instead of re-building. Returns an Enumerator when no block is given.
62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/ucode/coordinator.rb', line 62 def each_codepoint_with_indices(ucd_dir:, unihan_dir:) unless block_given? return enum_for(:each_codepoint_with_indices, ucd_dir: ucd_dir, unihan_dir: unihan_dir) end indices = build_indices(ucd_dir, unihan_dir) each_with_indices(ucd_dir: ucd_dir, unihan_dir: unihan_dir, indices: indices) do |cp| yield indices, cp end nil end |
#indices_for(ucd_dir:, unihan_dir:) ⇒ Object
Build (and return) the Coordinator::Indices for the given UCD + Unihan dirs. Useful when the caller needs the indices separately from the streaming pass (e.g. AggregateWriter#flush).
78 79 80 |
# File 'lib/ucode/coordinator.rb', line 78 def indices_for(ucd_dir:, unihan_dir:) build_indices(ucd_dir, unihan_dir) end |