Class: Ucode::Parsers::ScriptExtensions

Inherits:
Base
  • Object
show all
Defined in:
lib/ucode/parsers/script_extensions.rb

Overview

Parses ‘ScriptExtensions.txt` — additional scripts per codepoint.

Format (UAX #44):

XXXX..XXXX ; Latn Grek Cyrl  # trailing comment

A codepoint can be associated with many scripts. The parser yields one Tuple per (codepoint, script_code) pair; the Coordinator merges these into CodePoint#script_extensions.

‘script_code` is the ISO 15924 4-letter code already present in the source file (e.g. `Latn`, `Grek`). No alias resolution is needed.

Defined Under Namespace

Classes: Tuple

Class Method Summary collapse

Methods inherited from Base

each_line, parse_codepoint_or_range, parse_field, parse_hex_cp

Class Method Details

.each_record(path) ⇒ Object

Yields one Tuple per (codepoint, script_code) pair. Returns a lazy Enumerator when called without a block.



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/ucode/parsers/script_extensions.rb', line 29

def each_record(path)
  return enum_for(:each_record, path) unless block_given?

  each_line(path) do |line|
    fields = line.fields
    next if fields.length < 2

    codes_field = fields[1]
    next if codes_field.nil? || codes_field.empty?

    range = parse_codepoint_or_range(fields[0])
    codes = codes_field.split(/\s+/)

    each_cp(range) do |cp|
      codes.each do |code|
        yield Tuple.new(cp: cp, script_code: code)
      end
    end
  end

  nil
end