Class: Ucode::Parsers::NamesList

Inherits:
Base
  • Object
show all
Defined in:
lib/ucode/parsers/names_list.rb

Overview

Parses ‘NamesList.txt` — the human-curated annotated names file Unicode uses to render the Code Charts’ name pages.

Format (per the file’s own header):

cp; Name            ← header line at column 0 → new NamesListEntry
  → U+XXXX note    ← indented annotation lines
  × U+XXXX U+YYYY note
  ≡ U+XXXX note
  = alias text
  * footnote text

Plus dropped lines:

`# comment`         ← file-level comment
`% instruction`     ← dropped (instructional)
`~ heading`         ← dropped (table-of-contents)

Annotation scopes attach to the most recent header. Lines that do not start a new header are silently ignored.

Implemented as a small state machine: one current NamesListEntry is held in a local; header lines flush the previous entry, annotation lines append to the current entry. Regex cannot express this scoping.

Constant Summary collapse

MARKER_CROSS_REFERENCE =
"".freeze
MARKER_SAMPLE_SEQUENCE =
"×".freeze
MARKER_COMPAT_EQUIV =
"".freeze
MARKER_ALIAS =
"=".freeze
MARKER_FOOTNOTE =
"*".freeze
MARKER_INSTRUCTIONAL =
"%".freeze
MARKER_HEADING =
"~".freeze

Class Method Summary collapse

Methods inherited from Base

each_line, parse_codepoint_or_range, parse_field, parse_hex_cp

Class Method Details

.each_record(path) {|entry| ... } ⇒ Object

Yields one NamesListEntry per codepoint header. Returns a lazy Enumerator when no block is given.

Yields:

  • (entry)


59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# File 'lib/ucode/parsers/names_list.rb', line 59

def each_record(path)
  return enum_for(:each_record, path) unless block_given?

  entry = nil
  lineno = 0
  path_str = path.to_s

  File.foreach(path_str) do |raw|
    lineno += 1
    line = raw.chomp

    begin
      if header_line?(line)
        yield entry if entry
        entry = build_header(line)
      elsif indented_line?(line) && entry
        parsed = parse_annotation(line)
        attach_annotation(entry, parsed) if parsed
      end
      # else: blank, comment, heading, or pre-header — skip
    rescue MalformedLineError => e
      e.context[:file] ||= path_str
      e.context[:line] ||= lineno
      raise
    end
  end

  yield entry if entry
  nil
end