Class: LexerKit::IR::DFATable

Inherits:
Object
  • Object
show all
Defined in:
lib/lexer_kit/ir/dfa_table.rb

Overview

DFATable represents a compiled DFA for pattern matching. Uses byte class compression to reduce table size.

Constant Summary collapse

DEAD_STATE =

State 0 indicates no valid transition (must match C DFA_DEAD_STATE)

0
NO_ACCEPT =

Token ID indicating non-accepting state (must match C DFA_NO_ACCEPT)

0xFFFF

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(state_count:, byte_class:, transitions:, accept_states:) ⇒ DFATable

Returns a new instance of DFATable.

Parameters:

  • state_count (Integer)

    number of states

  • byte_class (Array<Integer>)

    256-element array mapping bytes to classes

  • transitions (Array<Integer>)

    state × class → next_state

  • accept_states (Hash<Integer, Integer>)

    state → token_id for accepting states



19
20
21
22
23
24
25
# File 'lib/lexer_kit/ir/dfa_table.rb', line 19

def initialize(state_count:, byte_class:, transitions:, accept_states:)
  @state_count = state_count
  @byte_class = byte_class.freeze
  @class_count = byte_class.max + 1
  @transitions = transitions.freeze
  @accept_states = accept_states.freeze
end

Instance Attribute Details

#accept_statesObject (readonly)

Returns the value of attribute accept_states.



13
14
15
# File 'lib/lexer_kit/ir/dfa_table.rb', line 13

def accept_states
  @accept_states
end

#byte_classObject (readonly)

Returns the value of attribute byte_class.



13
14
15
# File 'lib/lexer_kit/ir/dfa_table.rb', line 13

def byte_class
  @byte_class
end

#class_countObject (readonly)

Returns the value of attribute class_count.



13
14
15
# File 'lib/lexer_kit/ir/dfa_table.rb', line 13

def class_count
  @class_count
end

#state_countObject (readonly)

Returns the value of attribute state_count.



13
14
15
# File 'lib/lexer_kit/ir/dfa_table.rb', line 13

def state_count
  @state_count
end

#transitionsObject (readonly)

Returns the value of attribute transitions.



13
14
15
# File 'lib/lexer_kit/ir/dfa_table.rb', line 13

def transitions
  @transitions
end

Class Method Details

.from_binary(bytes) ⇒ Array(DFATable, Integer)

Decode from binary

Parameters:

  • bytes (String)

Returns:

  • (Array(DFATable, Integer))
    table, bytes_consumed


71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/lexer_kit/ir/dfa_table.rb', line 71

def self.from_binary(bytes)
  pos = 0

  state_count, class_count = bytes.byteslice(pos, 4).unpack("S>S>")
  pos += 4

  byte_class = bytes.byteslice(pos, 256).unpack("C256")
  pos += 256

  trans_size = state_count * class_count
  transitions = bytes.byteslice(pos, trans_size * 2).unpack("S>*")
  pos += trans_size * 2

  accept_count = bytes.byteslice(pos, 2).unpack1("S>")
  pos += 2

  accept_states = {}
  accept_count.times do
    state, token_id = bytes.byteslice(pos, 4).unpack("S>S>")
    accept_states[state] = token_id
    pos += 4
  end

  table = new(
    state_count: state_count,
    byte_class: byte_class,
    transitions: transitions,
    accept_states: accept_states
  )

  [table, pos]
end

Instance Method Details

#accept(state) ⇒ Integer?

Check if state is accepting

Parameters:

  • state (Integer)

Returns:

  • (Integer, nil)

    token ID if accepting, nil otherwise



39
40
41
# File 'lib/lexer_kit/ir/dfa_table.rb', line 39

def accept(state)
  @accept_states[state]
end

#inspectObject



104
105
106
# File 'lib/lexer_kit/ir/dfa_table.rb', line 104

def inspect
  "#<DFATable states=#{@state_count} classes=#{@class_count} accepts=#{@accept_states.size}>"
end

#to_binaryString

Encode to binary

Returns:

  • (String)


45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/lexer_kit/ir/dfa_table.rb', line 45

def to_binary
  parts = []

  # Header: state_count (u16), class_count (u16)
  parts << [@state_count, @class_count].pack("S>S>")

  # Byte class table (256 bytes)
  parts << @byte_class.pack("C256")

  # Transitions (state_count × class_count × u16)
  parts << @transitions.pack("S>*")

  # Accept states count (u16)
  parts << [@accept_states.size].pack("S>")

  # Accept states: [state (u16), token_id (u16)] pairs
  @accept_states.each do |state, token_id|
    parts << [state, token_id].pack("S>S>")
  end

  parts.join
end

#to_native_formatHash

Convert to format suitable for C native loading Returns dense accept_tokens array for O(1) lookup

Returns:

  • (Hash)

    data for C extension



111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/lexer_kit/ir/dfa_table.rb', line 111

def to_native_format
  # Build dense array: accept_tokens[state] = token_id
  accept_tokens = Array.new(@state_count, NO_ACCEPT)
  @accept_states.each { |state, token_id| accept_tokens[state] = token_id }
  {
    state_count: @state_count,
    class_count: @class_count,
    byte_class: @byte_class.pack("C*"),
    transitions: @transitions.pack("S>*"),
    accept_tokens: accept_tokens.pack("S>*")
  }
end

#transition(state, byte) ⇒ Integer

Get next state for current state and input byte

Parameters:

  • state (Integer)

    current state

  • byte (Integer)

    input byte

Returns:

  • (Integer)

    next state (0 = dead state)



31
32
33
34
# File 'lib/lexer_kit/ir/dfa_table.rb', line 31

def transition(state, byte)
  cls = @byte_class[byte]
  @transitions[(state * @class_count) + cls]
end