Class: LexerKit::DFA::NFA

Inherits:
Object
  • Object
show all
Includes:
CaseFolding
Defined in:
lib/lexer_kit/dfa/nfa.rb

Overview

NFA represents a Non-deterministic Finite Automaton. Built using Thompson’s construction from regex AST. Case folding is applied during NFA construction, not parsing.

Constant Summary collapse

EPSILON =

Epsilon transition marker

:epsilon

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(case_insensitive: false) ⇒ NFA

Returns a new instance of NFA.



16
17
18
19
20
21
22
23
# File 'lib/lexer_kit/dfa/nfa.rb', line 16

def initialize(case_insensitive: false)
  @case_insensitive = case_insensitive
  @states = 0
  @transitions = Hash.new { |h, k| h[k] = [] }
  @start_state = nil
  @accept_state = nil
  @token_id = nil
end

Instance Attribute Details

#accept_stateObject (readonly)

Returns the value of attribute accept_state.



14
15
16
# File 'lib/lexer_kit/dfa/nfa.rb', line 14

def accept_state
  @accept_state
end

#start_stateObject (readonly)

Returns the value of attribute start_state.



14
15
16
# File 'lib/lexer_kit/dfa/nfa.rb', line 14

def start_state
  @start_state
end

#token_idObject (readonly)

Returns the value of attribute token_id.



14
15
16
# File 'lib/lexer_kit/dfa/nfa.rb', line 14

def token_id
  @token_id
end

#transitionsObject (readonly)

Returns the value of attribute transitions.



14
15
16
# File 'lib/lexer_kit/dfa/nfa.rb', line 14

def transitions
  @transitions
end

Class Method Details

.from_ast(ast, token_id = 0, case_insensitive: false) ⇒ NFA

Build NFA from regex AST

Parameters:

  • ast (RegexAST::*)

    AST node

  • token_id (Integer) (defaults to: 0)

    token ID for accept state

  • case_insensitive (Boolean) (defaults to: false)

    whether to apply case folding

Returns:



42
43
44
45
46
47
# File 'lib/lexer_kit/dfa/nfa.rb', line 42

def self.from_ast(ast, token_id = 0, case_insensitive: false)
  nfa = new(case_insensitive: case_insensitive)
  start, accept = nfa.build(ast)
  nfa.finalize!(start_state: start, accept_state: accept, token_id: token_id)
  nfa
end

.from_regex(regex, token_id = 0) ⇒ NFA

Build NFA from Regex (AST with flags)

Parameters:

  • regex (RegexAST::Regex)

    regex pattern with flags

  • token_id (Integer) (defaults to: 0)

    token ID for accept state

Returns:



53
54
55
# File 'lib/lexer_kit/dfa/nfa.rb', line 53

def self.from_regex(regex, token_id = 0)
  from_ast(regex.ast, token_id, case_insensitive: regex.case_insensitive)
end

Instance Method Details

#add_transition(from, to, input) ⇒ Object

Add a transition



33
34
35
# File 'lib/lexer_kit/dfa/nfa.rb', line 33

def add_transition(from, to, input)
  @transitions[from] << [input, to]
end

#build(node) ⇒ Object

Build NFA fragment from AST node Returns [start_state, accept_state]



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/lexer_kit/dfa/nfa.rb', line 65

def build(node)
  case node
  when RegexAST::Literal
    build_literal(node)
  when RegexAST::CharClass
    build_char_class(node)
  when RegexAST::Any
    build_any
  when RegexAST::Concat
    build_concat(node)
  when RegexAST::Alternation
    build_alternation(node)
  when RegexAST::Quantifier
    build_quantifier(node)
  when RegexAST::Group
    build(node.child)
  else
    raise "unknown AST node: #{node.class}"
  end
end

#epsilon_closure(states) ⇒ Set<Integer>

Compute epsilon closure of a set of states

Parameters:

  • states (Set<Integer>)

    initial states

Returns:

  • (Set<Integer>)

    epsilon closure



89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/lexer_kit/dfa/nfa.rb', line 89

def epsilon_closure(states)
  closure = Set.new(states)
  worklist = states.to_a

  while (state = worklist.pop)
    @transitions[state].each do |input, target|
      if input == EPSILON && !closure.include?(target)
        closure << target
        worklist << target
      end
    end
  end

  closure
end

#finalize!(start_state:, accept_state:, token_id:) ⇒ Object



57
58
59
60
61
# File 'lib/lexer_kit/dfa/nfa.rb', line 57

def finalize!(start_state:, accept_state:, token_id:)
  @start_state = start_state
  @accept_state = accept_state
  @token_id = token_id
end

#first_byte_setSet<Integer>

Compute the set of possible first bytes this NFA can match

Returns:

  • (Set<Integer>)

    set of bytes (0-255)



121
122
123
124
125
126
127
128
129
130
131
132
# File 'lib/lexer_kit/dfa/nfa.rb', line 121

def first_byte_set
  closure = epsilon_closure(Set[@start_state])
  result = Set.new

  closure.each do |state|
    @transitions[state].each do |input, _target|
      result << input if input.is_a?(Integer)
    end
  end

  result
end

#move(states, input) ⇒ Set<Integer>

Get all states reachable from a set of states on a given input

Parameters:

  • states (Set<Integer>)

    current states

  • input (Integer)

    input byte

Returns:

  • (Set<Integer>)

    reachable states



109
110
111
112
113
114
115
116
117
# File 'lib/lexer_kit/dfa/nfa.rb', line 109

def move(states, input)
  result = Set.new
  states.each do |state|
    @transitions[state].each do |trans_input, target|
      result << target if trans_input == input
    end
  end
  result
end

#new_stateObject

Create a new state and return its ID



26
27
28
29
30
# File 'lib/lexer_kit/dfa/nfa.rb', line 26

def new_state
  state = @states
  @states += 1
  state
end