Class: Ucode::Parsers::Blocks

Inherits:
Base
  • Object
show all
Defined in:
lib/ucode/parsers/blocks.rb

Overview

Parses Blocks.txt — one block range per line.

Format (UAX #44):

XXXX..XXXX; Block Name

The id is the block name with runs of whitespace collapsed to a single underscore. The name is preserved verbatim. Per the project rules (CLAUDE.md), block names are NOT otherwise slugified.

plane_number is derived from the high bits of range_first.

Class Method Summary collapse

Methods inherited from Base

each_line, parse_codepoint_or_range, parse_field, parse_hex_cp

Class Method Details

.each_record(path) ⇒ Object

Yields one Block per non-comment line. Returns a lazy Enumerator when called without a block.



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/ucode/parsers/blocks.rb', line 22

def each_record(path)
  return enum_for(:each_record, path) unless block_given?

  each_line(path) do |line|
    fields = line.fields
    next if fields.length < 2

    range_field = fields[0]
    name = fields[1]
    next if name.nil? || name.empty?

    range = parse_codepoint_or_range(range_field)
    yield build_block(range, name)
  end

  nil
end

.find_by_id(path, id) ⇒ Models::Block?

Resolves a block by its identifier (the underscored form of the block name, e.g. "Basic_Latin", "Egyptian_Hieroglyphs_Extended-B"). Streams Blocks.txt once and short-circuits on first match — callers don't need to walk the whole ~340-block file.

Parameters:

  • path (Pathname, String)

    path to a Blocks.txt

  • id (String)

    block identifier (matches Models::Block#id)

Returns:

  • (Models::Block, nil)

    the block, or nil when no block has the given id



49
50
51
52
53
54
55
56
# File 'lib/ucode/parsers/blocks.rb', line 49

def find_by_id(path, id)
  return nil if id.nil? || id.empty?

  each_record(path) do |block|
    return block if block.id == id
  end
  nil
end

.find_by_id!(path, id) ⇒ Models::Block

Same as find_by_id but raises UnknownBlockError on miss. Use this in callers that can't recover from a missing block (CLI commands, extractors that need a block to proceed).

Parameters:

  • path (Pathname, String)

    path to a Blocks.txt

  • id (String)

    block identifier

Returns:

Raises:



66
67
68
69
70
71
72
# File 'lib/ucode/parsers/blocks.rb', line 66

def find_by_id!(path, id)
  find_by_id(path, id) or
    raise Ucode::UnknownBlockError.new(
      "unknown Unicode block: #{id.inspect}",
      context: { block_id: id, blocks_txt: path.to_s },
    )
end