Module: LexerKit
- Defined in:
- lib/lexer_kit.rb,
lib/lexer_kit/ir.rb,
lib/lexer_kit/cli.rb,
lib/lexer_kit/dfa.rb,
lib/lexer_kit/core.rb,
lib/lexer_kit/trie.rb,
lib/lexer_kit/debug.rb,
lib/lexer_kit/errors.rb,
lib/lexer_kit/format.rb,
lib/lexer_kit/runner.rb,
lib/lexer_kit/builder.rb,
lib/lexer_kit/dfa/nfa.rb,
lib/lexer_kit/version.rb,
lib/lexer_kit/core/span.rb,
lib/lexer_kit/ir/opcode.rb,
lib/lexer_kit/core/token.rb,
lib/lexer_kit/core/source.rb,
lib/lexer_kit/format/lkb1.rb,
lib/lexer_kit/format/lkt1.rb,
lib/lexer_kit/cli/commands.rb,
lib/lexer_kit/ir/dfa_table.rb,
lib/lexer_kit/dfa/regex_ast.rb,
lib/lexer_kit/ir/jump_table.rb,
lib/lexer_kit/ir/serializer.rb,
lib/lexer_kit/dfa/utf8_range.rb,
lib/lexer_kit/ir/instruction.rb,
lib/lexer_kit/core/diagnostic.rb,
lib/lexer_kit/dfa/dfa_builder.rb,
lib/lexer_kit/builder/compiler.rb,
lib/lexer_kit/builder/mode_def.rb,
lib/lexer_kit/debug/visualizer.rb,
lib/lexer_kit/dfa/case_folding.rb,
lib/lexer_kit/dfa/regex_parser.rb,
lib/lexer_kit/ir/constant_pool.rb,
lib/lexer_kit/ir/keyword_table.rb,
lib/lexer_kit/builder/token_def.rb,
lib/lexer_kit/builder/validator.rb,
lib/lexer_kit/dfa/dfa_minimizer.rb,
lib/lexer_kit/debug/disassembler.rb,
lib/lexer_kit/format/lkb1/decoder.rb,
lib/lexer_kit/ir/compiled_program.rb,
lib/lexer_kit/dfa/byte_class_builder.rb,
lib/lexer_kit/dfa/utf8_range_pattern.rb,
lib/lexer_kit/dfa/char_class_collector.rb,
lib/lexer_kit/builder/conflict_detector.rb
Defined Under Namespace
Modules: CLI, Core, DFA, Debug, Format, IR, RegexAstProvider Classes: BuildError, Builder, CompileError, DiagnosticError, Error, IntegrityError, InvalidBinaryError, NativeExtensionError, ParseError, Runner, Trie, VMError
Constant Summary collapse
- RESERVED_TOKEN_ID =
Reserved token IDs 0: Internal sentinel (never emitted) 1: INVALID (error token) 2-7: Reserved for future use 8+: User-defined tokens
Note: The VM only emits tokens with valid IDs:
-
INVALID_TOKEN_ID (1) for error tokens
-
User tokens (>= FIRST_USER_TOKEN_ID)
Tokens with sentinel/reserved IDs (0, 2-7) or zero length are filtered out.
-
0- INVALID_TOKEN_ID =
1- FIRST_USER_TOKEN_ID =
8- VERSION =
"0.6.0"
Class Method Summary collapse
-
.build {|Builder| ... } ⇒ Builder
Build a lexer from DSL.
-
.load_builder(path) ⇒ Builder
Load a builder from DSL source file.
-
.load_lexer(path) ⇒ IR::CompiledProgram
Load a compiled lexer from .lkt1 or .lkb1 file.
-
.native? ⇒ Boolean
Check if native Rust extension is available.
-
.utf8_range(*ranges) ⇒ Object
Create a UTF-8 range pattern for the LexerKit regex engine.
Class Method Details
.build {|Builder| ... } ⇒ Builder
Build a lexer from DSL
100 101 102 |
# File 'lib/lexer_kit.rb', line 100 def self.build(&block) Builder.new.tap { |b| b.instance_eval(&block) if block } end |
.load_builder(path) ⇒ Builder
Load a builder from DSL source file
142 143 144 145 146 147 148 149 150 151 152 153 154 |
# File 'lib/lexer_kit.rb', line 142 def self.load_builder(path) # Expand relative/absolute paths from current directory path = File.(path) raise ArgumentError, "Builder source not found: #{path}" unless File.exist?(path) content = File.read(path) result = eval(content, TOPLEVEL_BINDING, path) # rubocop:disable Security/Eval return result if result.is_a?(Builder) raise ArgumentError, "DSL file must return LexerKit::Builder instance" end |
.load_lexer(path) ⇒ IR::CompiledProgram
Load a compiled lexer from .lkt1 or .lkb1 file
116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/lexer_kit.rb', line 116 def self.load_lexer(path) # Expand relative/absolute paths from current directory path = File.(path) raise ArgumentError, "Lexer not found: #{path}" unless File.exist?(path) if path.end_with?(".lkt1") Format::LKT1.load(path).program elsif path.end_with?(".lkb1") Format::LKB1.load(path).program else raise ArgumentError, "Expected .lkt1 or .lkb1 file: #{path}" end end |
.native? ⇒ Boolean
Check if native Rust extension is available
93 94 95 |
# File 'lib/lexer_kit.rb', line 93 def self.native? LEXER_KIT_NATIVE end |
.utf8_range(*ranges) ⇒ Object
Create a UTF-8 range pattern for the LexerKit regex engine.
Accepted inputs:
-
“あ” (single character)
-
“あ”..“ん” (Range, inclusive)
-
Integer codepoint ranges (e.g., 0x3041..0x3096)
Notes:
-
Exclusive ranges (e.g., “a”…“z”) are not supported.
-
Multi-character strings like “abc” are not supported.
-
Range endpoints must be single characters or integers.
61 62 63 64 65 |
# File 'lib/lexer_kit.rb', line 61 def self.utf8_range(*ranges) require_relative "lexer_kit/dfa/utf8_range_pattern" parsed = ranges.map { |range| parse_range_codepoints(range) } DFA::Utf8RangePattern.new(parsed) end |