Class: LexerKit::DFA::CharClassCollector

Inherits:
Object
  • Object
show all
Includes:
RegexAST
Defined in:
lib/lexer_kit/dfa/char_class_collector.rb

Overview

CharClassCollector collects character class items and builds appropriate AST. Separates byte/codepoint handling from parsing control flow. Case folding is handled at the NFA layer, not here.

Instance Method Summary collapse

Constructor Details

#initializeCharClassCollector

Returns a new instance of CharClassCollector.



11
12
13
14
# File 'lib/lexer_kit/dfa/char_class_collector.rb', line 11

def initialize
  @byte_ranges = []
  @codepoint_ranges = []
end

Instance Method Details

#add_item(item) ⇒ Object

Add a single item (byte or codepoint)



17
18
19
20
21
22
23
# File 'lib/lexer_kit/dfa/char_class_collector.rb', line 17

def add_item(item)
  if item[:type] == :byte
    @byte_ranges << [item[:value], item[:value]]
  else
    @codepoint_ranges << [item[:value], item[:value]]
  end
end

#add_range(start_item, end_item) ⇒ Object

Add a range of items

Raises:

  • (ArgumentError)


26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/lexer_kit/dfa/char_class_collector.rb', line 26

def add_range(start_item, end_item)
  raise ArgumentError, "mixed byte and multibyte range in char class" if start_item[:type] != end_item[:type]

  if start_item[:type] == :byte
    @byte_ranges << [start_item[:value], end_item[:value]]
  else
    start_cp = start_item[:value]
    end_cp = end_item[:value]
    raise ArgumentError, "invalid multibyte range" if start_cp > end_cp

    @codepoint_ranges << [start_cp, end_cp]
  end
end

#to_ast(negated:, meta:) ⇒ Object

Build the final AST



41
42
43
44
45
46
# File 'lib/lexer_kit/dfa/char_class_collector.rb', line 41

def to_ast(negated:, meta:)
  validate_negated_multibyte!(negated)
  ascii_ast = build_ascii_ast(negated, meta)
  utf8_ast = build_utf8_ast(meta)
  combine_asts(ascii_ast, utf8_ast, negated, meta)
end