Class: Ucode::IndexBuilder

Inherits:
Object
  • Object
show all
Defined in:
lib/ucode/index_builder.rb

Overview

Streaming accumulator that turns a sequence of CodePoint records into per-property sorted + coalesced Index instances.

Lifecycle:

builder = IndexBuilder.new
Coordinator.new.each_codepoint(...) { |cp| builder.add(cp) }
builder.blocks_index   # => Index
builder.scripts_index  # => Index

The Coordinator yields cps in ascending cp order, so the per-name cp arrays are already sorted. The final pass coalesces adjacent cps (gap of 1) into RangeEntry runs.

**Coalescing caveat**: ranges are derived from ASSIGNED cps only. If a block has unassigned cps in the middle, the resulting range will fragment around them. For lookup_block(cp) on an assigned cp, the answer is correct. For an unassigned cp, the lookup returns nil. This is a deliberate trade-off for streaming memory bounds —the canonical block ranges are in ‘Coordinator#indices.blocks`, not in the streamed cps.

Instance Method Summary collapse

Constructor Details

#initializeIndexBuilder

Returns a new instance of IndexBuilder.



29
30
31
32
# File 'lib/ucode/index_builder.rb', line 29

def initialize
  @cps_by_block = Hash.new { |h, k| h[k] = [] }
  @cps_by_script = Hash.new { |h, k| h[k] = [] }
end

Instance Method Details

#add(cp) ⇒ void

This method returns an undefined value.

Fold one CodePoint into the per-property accumulators. No-ops if the cp has no block_id / script_code (e.g. an unassigned cp surfaced through UnicodeData, or a cp outside any fixture range).

Parameters:



39
40
41
42
# File 'lib/ucode/index_builder.rb', line 39

def add(cp)
  push_named(@cps_by_block, cp.block_id, cp.cp)
  push_named(@cps_by_script, cp.script_code, cp.cp)
end

#blocks_indexIndex

Returns:



45
46
47
# File 'lib/ucode/index_builder.rb', line 45

def blocks_index
  Index.new(to_entries(@cps_by_block))
end

#scripts_indexIndex

Returns:



50
51
52
# File 'lib/ucode/index_builder.rb', line 50

def scripts_index
  Index.new(to_entries(@cps_by_script))
end