Class: LexerKit::DFA::NFA
- Inherits:
-
Object
- Object
- LexerKit::DFA::NFA
- Includes:
- CaseFolding
- Defined in:
- lib/lexer_kit/dfa/nfa.rb
Overview
NFA represents a Non-deterministic Finite Automaton. Built using Thompson’s construction from regex AST. Case folding is applied during NFA construction, not parsing.
Constant Summary collapse
- EPSILON =
Epsilon transition marker
:epsilon
Instance Attribute Summary collapse
-
#accept_state ⇒ Object
readonly
Returns the value of attribute accept_state.
-
#start_state ⇒ Object
readonly
Returns the value of attribute start_state.
-
#token_id ⇒ Object
readonly
Returns the value of attribute token_id.
-
#transitions ⇒ Object
readonly
Returns the value of attribute transitions.
Class Method Summary collapse
-
.from_ast(ast, token_id = 0, case_insensitive: false) ⇒ NFA
Build NFA from regex AST.
-
.from_regex(regex, token_id = 0) ⇒ NFA
Build NFA from Regex (AST with flags).
Instance Method Summary collapse
-
#add_transition(from, to, input) ⇒ Object
Add a transition.
-
#build(node) ⇒ Object
Build NFA fragment from AST node Returns [start_state, accept_state].
-
#epsilon_closure(states) ⇒ Set<Integer>
Compute epsilon closure of a set of states.
- #finalize!(start_state:, accept_state:, token_id:) ⇒ Object
-
#first_byte_set ⇒ Set<Integer>
Compute the set of possible first bytes this NFA can match.
-
#initialize(case_insensitive: false) ⇒ NFA
constructor
A new instance of NFA.
-
#move(states, input) ⇒ Set<Integer>
Get all states reachable from a set of states on a given input.
-
#new_state ⇒ Object
Create a new state and return its ID.
Constructor Details
#initialize(case_insensitive: false) ⇒ NFA
Returns a new instance of NFA.
16 17 18 19 20 21 22 23 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 16 def initialize(case_insensitive: false) @case_insensitive = case_insensitive @states = 0 @transitions = Hash.new { |h, k| h[k] = [] } @start_state = nil @accept_state = nil @token_id = nil end |
Instance Attribute Details
#accept_state ⇒ Object (readonly)
Returns the value of attribute accept_state.
14 15 16 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 14 def accept_state @accept_state end |
#start_state ⇒ Object (readonly)
Returns the value of attribute start_state.
14 15 16 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 14 def start_state @start_state end |
#token_id ⇒ Object (readonly)
Returns the value of attribute token_id.
14 15 16 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 14 def token_id @token_id end |
#transitions ⇒ Object (readonly)
Returns the value of attribute transitions.
14 15 16 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 14 def transitions @transitions end |
Class Method Details
.from_ast(ast, token_id = 0, case_insensitive: false) ⇒ NFA
Build NFA from regex AST
42 43 44 45 46 47 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 42 def self.from_ast(ast, token_id = 0, case_insensitive: false) nfa = new(case_insensitive: case_insensitive) start, accept = nfa.build(ast) nfa.finalize!(start_state: start, accept_state: accept, token_id: token_id) nfa end |
.from_regex(regex, token_id = 0) ⇒ NFA
Build NFA from Regex (AST with flags)
53 54 55 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 53 def self.from_regex(regex, token_id = 0) from_ast(regex.ast, token_id, case_insensitive: regex.case_insensitive) end |
Instance Method Details
#add_transition(from, to, input) ⇒ Object
Add a transition
33 34 35 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 33 def add_transition(from, to, input) @transitions[from] << [input, to] end |
#build(node) ⇒ Object
Build NFA fragment from AST node Returns [start_state, accept_state]
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 65 def build(node) case node when RegexAST::Literal build_literal(node) when RegexAST::CharClass build_char_class(node) when RegexAST::Any build_any when RegexAST::Concat build_concat(node) when RegexAST::Alternation build_alternation(node) when RegexAST::Quantifier build_quantifier(node) when RegexAST::Group build(node.child) else raise "unknown AST node: #{node.class}" end end |
#epsilon_closure(states) ⇒ Set<Integer>
Compute epsilon closure of a set of states
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 89 def epsilon_closure(states) closure = Set.new(states) worklist = states.to_a while (state = worklist.pop) @transitions[state].each do |input, target| if input == EPSILON && !closure.include?(target) closure << target worklist << target end end end closure end |
#finalize!(start_state:, accept_state:, token_id:) ⇒ Object
57 58 59 60 61 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 57 def finalize!(start_state:, accept_state:, token_id:) @start_state = start_state @accept_state = accept_state @token_id = token_id end |
#first_byte_set ⇒ Set<Integer>
Compute the set of possible first bytes this NFA can match
121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 121 def first_byte_set closure = epsilon_closure(Set[@start_state]) result = Set.new closure.each do |state| @transitions[state].each do |input, _target| result << input if input.is_a?(Integer) end end result end |
#move(states, input) ⇒ Set<Integer>
Get all states reachable from a set of states on a given input
109 110 111 112 113 114 115 116 117 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 109 def move(states, input) result = Set.new states.each do |state| @transitions[state].each do |trans_input, target| result << target if trans_input == input end end result end |
#new_state ⇒ Object
Create a new state and return its ID
26 27 28 29 30 |
# File 'lib/lexer_kit/dfa/nfa.rb', line 26 def new_state state = @states @states += 1 state end |