Class: Ucode::Parsers::Blocks
Overview
Parses Blocks.txt — one block range per line.
Format (UAX #44):
XXXX..XXXX; Block Name
The id is the block name with runs of whitespace collapsed to a
single underscore. The name is preserved verbatim. Per the
project rules (CLAUDE.md), block names are NOT otherwise slugified.
plane_number is derived from the high bits of range_first.
Class Method Summary collapse
-
.each_record(path) ⇒ Object
Yields one Block per non-comment line.
-
.find_by_id(path, id) ⇒ Models::Block?
Resolves a block by its identifier (the underscored form of the block name, e.g. "Basic_Latin", "Egyptian_Hieroglyphs_Extended-B").
-
.find_by_id!(path, id) ⇒ Models::Block
Same as Blocks.find_by_id but raises UnknownBlockError on miss.
Methods inherited from Base
each_line, parse_codepoint_or_range, parse_field, parse_hex_cp
Class Method Details
.each_record(path) ⇒ Object
Yields one Block per non-comment line. Returns a lazy Enumerator when called without a block.
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/ucode/parsers/blocks.rb', line 22 def each_record(path) return enum_for(:each_record, path) unless block_given? each_line(path) do |line| fields = line.fields next if fields.length < 2 range_field = fields[0] name = fields[1] next if name.nil? || name.empty? range = parse_codepoint_or_range(range_field) yield build_block(range, name) end nil end |
.find_by_id(path, id) ⇒ Models::Block?
Resolves a block by its identifier (the underscored form of
the block name, e.g. "Basic_Latin", "Egyptian_Hieroglyphs_Extended-B").
Streams Blocks.txt once and short-circuits on first match —
callers don't need to walk the whole ~340-block file.
49 50 51 52 53 54 55 56 |
# File 'lib/ucode/parsers/blocks.rb', line 49 def find_by_id(path, id) return nil if id.nil? || id.empty? each_record(path) do |block| return block if block.id == id end nil end |
.find_by_id!(path, id) ⇒ Models::Block
Same as find_by_id but raises UnknownBlockError on miss. Use this in callers that can't recover from a missing block (CLI commands, extractors that need a block to proceed).
66 67 68 69 70 71 72 |
# File 'lib/ucode/parsers/blocks.rb', line 66 def find_by_id!(path, id) find_by_id(path, id) or raise Ucode::UnknownBlockError.new( "unknown Unicode block: #{id.inspect}", context: { block_id: id, blocks_txt: path.to_s }, ) end |