Module: LexerKit

Defined in:
lib/lexer_kit.rb,
lib/lexer_kit/ir.rb,
lib/lexer_kit/cli.rb,
lib/lexer_kit/dfa.rb,
lib/lexer_kit/core.rb,
lib/lexer_kit/trie.rb,
lib/lexer_kit/debug.rb,
lib/lexer_kit/errors.rb,
lib/lexer_kit/format.rb,
lib/lexer_kit/runner.rb,
lib/lexer_kit/builder.rb,
lib/lexer_kit/dfa/nfa.rb,
lib/lexer_kit/version.rb,
lib/lexer_kit/core/span.rb,
lib/lexer_kit/ir/opcode.rb,
lib/lexer_kit/core/token.rb,
lib/lexer_kit/core/source.rb,
lib/lexer_kit/format/lkb1.rb,
lib/lexer_kit/format/lkt1.rb,
lib/lexer_kit/cli/commands.rb,
lib/lexer_kit/ir/dfa_table.rb,
lib/lexer_kit/dfa/regex_ast.rb,
lib/lexer_kit/ir/jump_table.rb,
lib/lexer_kit/ir/serializer.rb,
lib/lexer_kit/dfa/utf8_range.rb,
lib/lexer_kit/ir/instruction.rb,
lib/lexer_kit/core/diagnostic.rb,
lib/lexer_kit/dfa/dfa_builder.rb,
lib/lexer_kit/builder/compiler.rb,
lib/lexer_kit/builder/mode_def.rb,
lib/lexer_kit/debug/visualizer.rb,
lib/lexer_kit/dfa/case_folding.rb,
lib/lexer_kit/dfa/regex_parser.rb,
lib/lexer_kit/ir/constant_pool.rb,
lib/lexer_kit/ir/keyword_table.rb,
lib/lexer_kit/builder/token_def.rb,
lib/lexer_kit/builder/validator.rb,
lib/lexer_kit/dfa/dfa_minimizer.rb,
lib/lexer_kit/debug/disassembler.rb,
lib/lexer_kit/format/lkb1/decoder.rb,
lib/lexer_kit/ir/compiled_program.rb,
lib/lexer_kit/dfa/byte_class_builder.rb,
lib/lexer_kit/dfa/utf8_range_pattern.rb,
lib/lexer_kit/dfa/char_class_collector.rb,
lib/lexer_kit/builder/conflict_detector.rb

Defined Under Namespace

Modules: CLI, Core, DFA, Debug, Format, IR, RegexAstProvider Classes: BuildError, Builder, CompileError, DiagnosticError, Error, IntegrityError, InvalidBinaryError, NativeExtensionError, ParseError, Runner, Trie, VMError

Constant Summary collapse

RESERVED_TOKEN_ID =

Reserved token IDs 0: Internal sentinel (never emitted) 1: INVALID (error token) 2-7: Reserved for future use 8+: User-defined tokens

Note: The VM only emits tokens with valid IDs:

  • INVALID_TOKEN_ID (1) for error tokens

  • User tokens (>= FIRST_USER_TOKEN_ID)

Tokens with sentinel/reserved IDs (0, 2-7) or zero length are filtered out.

0
INVALID_TOKEN_ID =
1
FIRST_USER_TOKEN_ID =
8
VERSION =
"0.6.0"

Class Method Summary collapse

Class Method Details

.build {|Builder| ... } ⇒ Builder

Build a lexer from DSL

Yields:

Returns:



100
101
102
# File 'lib/lexer_kit.rb', line 100

def self.build(&block)
  Builder.new.tap { |b| b.instance_eval(&block) if block }
end

.load_builder(path) ⇒ Builder

Load a builder from DSL source file

Examples:

Load from relative path

LexerKit.load_builder("examples/languages/json.rb")

Load from absolute path

LexerKit.load_builder("/path/to/json.rb")

Parameters:

  • path (String)

    path to DSL source file (relative or absolute)

Returns:

Raises:

  • (ArgumentError)

    if file doesn’t return Builder instance



142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/lexer_kit.rb', line 142

def self.load_builder(path)
  # Expand relative/absolute paths from current directory
  path = File.expand_path(path)

  raise ArgumentError, "Builder source not found: #{path}" unless File.exist?(path)

  content = File.read(path)
  result = eval(content, TOPLEVEL_BINDING, path) # rubocop:disable Security/Eval

  return result if result.is_a?(Builder)

  raise ArgumentError, "DSL file must return LexerKit::Builder instance"
end

.load_lexer(path) ⇒ IR::CompiledProgram

Load a compiled lexer from .lkt1 or .lkb1 file

Examples:

Load from relative path

LexerKit.load_lexer("lexers/json.lkt1")
LexerKit.load_lexer(File.expand_path("../data/json.lkt1", __dir__))

Load from absolute path

LexerKit.load_lexer("/path/to/json.lkt1")

Parameters:

  • path (String)

    path to lexer file (relative or absolute)

Returns:

Raises:

  • (ArgumentError)

    if file not found or invalid extension



116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/lexer_kit.rb', line 116

def self.load_lexer(path)
  # Expand relative/absolute paths from current directory
  path = File.expand_path(path)

  raise ArgumentError, "Lexer not found: #{path}" unless File.exist?(path)

  if path.end_with?(".lkt1")
    Format::LKT1.load(path).program
  elsif path.end_with?(".lkb1")
    Format::LKB1.load(path).program
  else
    raise ArgumentError, "Expected .lkt1 or .lkb1 file: #{path}"
  end
end

.native?Boolean

Check if native Rust extension is available

Returns:

  • (Boolean)


93
94
95
# File 'lib/lexer_kit.rb', line 93

def self.native?
  LEXER_KIT_NATIVE
end

.utf8_range(*ranges) ⇒ Object

Create a UTF-8 range pattern for the LexerKit regex engine.

Accepted inputs:

  • “あ” (single character)

  • “あ”..“ん” (Range, inclusive)

  • Integer codepoint ranges (e.g., 0x3041..0x3096)

Notes:

  • Exclusive ranges (e.g., “a”…“z”) are not supported.

  • Multi-character strings like “abc” are not supported.

  • Range endpoints must be single characters or integers.



61
62
63
64
65
# File 'lib/lexer_kit.rb', line 61

def self.utf8_range(*ranges)
  require_relative "lexer_kit/dfa/utf8_range_pattern"
  parsed = ranges.map { |range| parse_range_codepoints(range) }
  DFA::Utf8RangePattern.new(parsed)
end