Class: LexerKit::Core::Source

Inherits:
Object
  • Object
show all
Defined in:
lib/lexer_kit/core/source.rb

Overview

Source holds the input byte sequence and optional filename. It provides line/column conversion for diagnostics.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input, filename: nil) ⇒ Source

Returns a new instance of Source.

Parameters:

  • input (String)

    input string (will be stored as-is for char conversion, and as BINARY for byte operations)

  • filename (String, nil) (defaults to: nil)

    optional filename for diagnostics



12
13
14
15
16
17
# File 'lib/lexer_kit/core/source.rb', line 12

def initialize(input, filename: nil)
  @original_string = input.freeze
  @bytes = input.dup.force_encoding(Encoding::BINARY).freeze
  @filename = filename&.freeze
  @line_starts = nil
end

Instance Attribute Details

#bytesObject (readonly)

Returns the value of attribute bytes.



8
9
10
# File 'lib/lexer_kit/core/source.rb', line 8

def bytes
  @bytes
end

#filenameObject (readonly)

Returns the value of attribute filename.



8
9
10
# File 'lib/lexer_kit/core/source.rb', line 8

def filename
  @filename
end

Instance Method Details

#byte_offset_for_char_index(char_index) ⇒ Integer

Convert character index to byte offset. For BINARY input, returns char_index directly (O(1)). For other encodings (e.g. UTF-8), computes byte offset (O(n), error paths only).

Parameters:

  • char_index (Integer)

    character position (0-based)

Returns:

  • (Integer)

    byte offset



129
130
131
132
133
134
135
# File 'lib/lexer_kit/core/source.rb', line 129

def byte_offset_for_char_index(char_index)
  if @original_string.encoding == Encoding::BINARY
    char_index
  else
    @original_string[0...char_index].bytesize
  end
end

#inspectObject



148
149
150
151
# File 'lib/lexer_kit/core/source.rb', line 148

def inspect
  filename_str = @filename ? " #{@filename.inspect}" : ""
  "#<LexerKit::Core::Source#{filename_str} #{length} bytes>"
end

#lengthInteger Also known as: size

Length in bytes

Returns:

  • (Integer)


21
22
23
# File 'lib/lexer_kit/core/source.rb', line 21

def length
  @bytes.bytesize
end

#line_col(byte_offset) ⇒ Array(Integer, Integer)

Convert byte offset to line and column (1-based) Builds line index if not already built

Parameters:

  • byte_offset (Integer)

Returns:

  • (Array(Integer, Integer))
    line, column


50
51
52
53
54
55
56
57
58
59
60
# File 'lib/lexer_kit/core/source.rb', line 50

def line_col(byte_offset)
  line_index! unless @line_starts

  # Binary search for line
  line = @line_starts.bsearch_index { |start| start > byte_offset }
  line ||= @line_starts.length
  line_start = @line_starts[line - 1]
  col = byte_offset - line_start + 1

  [line, col]
end

#line_countInteger

Get the number of lines

Returns:

  • (Integer)


84
85
86
87
# File 'lib/lexer_kit/core/source.rb', line 84

def line_count
  line_index! unless @line_starts
  @line_starts.length
end

#line_index!self

Build line index (explicit, not automatic) Call this before using line_col or line_slice on large inputs

Returns:

  • (self)


30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/lexer_kit/core/source.rb', line 30

def line_index!
  return self if @line_starts

  @line_starts = [0]
  pos = 0
  while pos < @bytes.bytesize
    byte = @bytes.getbyte(pos)
    if byte == 0x0A # LF
      @line_starts << (pos + 1)
    end
    pos += 1
  end
  @line_starts.freeze
  self
end

#line_slice(line) ⇒ String?

Get the content of a specific line (1-based)

Parameters:

  • line (Integer)

    line number (1-based)

Returns:

  • (String, nil)


65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/lexer_kit/core/source.rb', line 65

def line_slice(line)
  line_index! unless @line_starts
  return nil if line < 1 || line > @line_starts.length

  start = @line_starts[line - 1]
  if line < @line_starts.length
    # Not the last line
    end_pos = @line_starts[line]
    content = @bytes.byteslice(start, end_pos - start)
    # Remove trailing newline
    content.chomp
  else
    # Last line
    @bytes.byteslice(start, @bytes.bytesize - start)
  end
end

#span(start, len) ⇒ Span

Create a span for the given range

Parameters:

  • start (Integer)

    start byte offset

  • len (Integer)

    length

Returns:



93
94
95
# File 'lib/lexer_kit/core/source.rb', line 93

def span(start, len)
  Span.new(start, len)
end

#span_for_char_index(char_index, len: 1) ⇒ Span

Get the span for a character index. For BINARY input, O(1). For other encodings, O(n) (error paths only).

Parameters:

  • char_index (Integer)

    character position (0-based)

  • len (Integer) (defaults to: 1)

    character length (default: 1)

Returns:



142
143
144
145
146
# File 'lib/lexer_kit/core/source.rb', line 142

def span_for_char_index(char_index, len: 1)
  byte_start = byte_offset_for_char_index(char_index)
  byte_end = byte_offset_for_char_index(char_index + len)
  Span.new(byte_start, byte_end - byte_start)
end

#span_for_line(line) ⇒ Span

Get the span covering an entire line (1-based)

Parameters:

  • line (Integer)

    line number (1-based)

Returns:



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/lexer_kit/core/source.rb', line 107

def span_for_line(line)
  line_index! unless @line_starts

  line = [line, 1].max
  return Span.new(0, 0) if line > @line_starts.length

  start = @line_starts[line - 1]
  line_end = if line < @line_starts.length
               # Not the last line - span up to (but not including) newline
               @line_starts[line] - 1
             else
               # Last line
               @bytes.bytesize
             end
  Span.new(start, line_end - start)
end

#text(span) ⇒ String

Extract text for a span

Parameters:

Returns:

  • (String)


100
101
102
# File 'lib/lexer_kit/core/source.rb', line 100

def text(span)
  span.slice(@bytes)
end