Class: Crass::Scanner

Inherits:
Object
  • Object
show all
Defined in:
lib/crass/scanner.rb

Overview

Similar to a StringScanner, but with extra functionality needed to tokenize CSS while preserving the original text.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input) ⇒ Scanner

Creates a Scanner instance for the given input string or IO instance.



29
30
31
32
33
34
# File 'lib/crass/scanner.rb', line 29

def initialize(input)
  @string  = input.is_a?(IO) ? input.read : input.to_s
  @scanner = StringScanner.new(@string)

  reset
end

Instance Attribute Details

#currentObject (readonly)

Current character, or nil if the scanner hasn't yet consumed a character, or is at the end of the string.



11
12
13
# File 'lib/crass/scanner.rb', line 11

def current
  @current
end

#markerObject

Current marker position. Use #marked to get the substring between #marker and #pos.



15
16
17
# File 'lib/crass/scanner.rb', line 15

def marker
  @marker
end

#posObject

Position of the next character that will be consumed. This is a character position, not a byte position, so it accounts for multi-byte characters.

Byte offsets (used internally for fast substring extraction) are tracked separately by the underlying StringScanner, whose pos always reflects the byte offset corresponding to this character position.



23
24
25
# File 'lib/crass/scanner.rb', line 23

def pos
  @pos
end

#stringObject (readonly)

String being scanned.



26
27
28
# File 'lib/crass/scanner.rb', line 26

def string
  @string
end

Instance Method Details

#consumeObject

Consumes the next character and returns it, advancing the pointer, or an empty string if the end of the string has been reached.



38
39
40
41
42
43
44
45
# File 'lib/crass/scanner.rb', line 38

def consume
  if @pos < @len
    @pos    += 1
    @current = @scanner.getch
  else
    ''
  end
end

#consume_restObject

Consumes the rest of the string and returns it, advancing the pointer to the end of the string. Returns an empty string is the end of the string has already been reached.



50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/crass/scanner.rb', line 50

def consume_rest
  result = @scanner.rest

  # `StringScanner#rest` does not advance the scan pointer, so move it to
  # the end of the input to keep the byte offset in sync with {#pos}. This
  # ensures a subsequent {#marked} extracts the correct substring.
  @scanner.terminate

  @current = result[-1]
  @pos     = @len

  result
end

#eos?Boolean

Returns true if the end of the string has been reached, false otherwise.

Returns:

  • (Boolean)


66
67
68
# File 'lib/crass/scanner.rb', line 66

def eos?
  @pos == @len
end

#markObject

Sets the marker to the position of the next character that will be consumed.



72
73
74
75
# File 'lib/crass/scanner.rb', line 72

def mark
  @byte_marker = @scanner.pos
  @marker      = @pos
end

#markedObject

Returns the substring between #marker and #pos, without altering the pointer.



79
80
81
82
83
84
85
86
# File 'lib/crass/scanner.rb', line 79

def marked
  # Extract the marked text using byte offsets rather than character
  # offsets. Slicing the original string by character offset is O(n) on
  # multi-byte input (Ruby must translate the character index into a byte
  # index), which makes tokenizing non-ASCII input superlinear. Byte slicing
  # is O(length) regardless of how far into the string we are.
  @string.byteslice(@byte_marker, @scanner.pos - @byte_marker) || ''
end

#peek(length = 1) ⇒ Object

Returns up to length characters starting at the current position, but doesn't consume them. The number of characters returned may be less than length if the end of the string is reached.



91
92
93
94
95
96
97
98
# File 'lib/crass/scanner.rb', line 91

def peek(length = 1)
  # Grab the bytes for up to _length_ characters and then take the first
  # _length_ characters. A UTF-8 character is at most four bytes, so `length
  # * 4` bytes always contains at least _length_ whole characters when that
  # many remain. This avoids the O(n) character-offset slice that
  # `@string[pos, length]` would otherwise perform on multi-byte input.
  @string.byteslice(@scanner.pos, length * 4).slice(0, length) || ''
end

#reconsumeObject

Moves the pointer back one character without changing the value of #current. The next call to #consume will re-consume the current character.



103
104
105
106
# File 'lib/crass/scanner.rb', line 103

def reconsume
  @scanner.unscan
  @pos -= 1 if @pos > 0
end

#resetObject

Resets the pointer to the beginning of the string.



109
110
111
112
113
114
115
116
117
# File 'lib/crass/scanner.rb', line 109

def reset
  @scanner.reset

  @byte_marker = 0
  @current     = nil
  @len         = @string.size
  @marker      = 0
  @pos         = 0
end

#scan(pattern) ⇒ Object

Tries to match pattern at the current position. If it matches, the matched substring will be returned and the pointer will be advanced. Otherwise, nil will be returned.



122
123
124
125
126
127
128
129
# File 'lib/crass/scanner.rb', line 122

def scan(pattern)
  if match = @scanner.scan(pattern)
    @pos     += match.size
    @current  = match[-1]
  end

  match
end

#scan_until(pattern) ⇒ Object

Scans the string until the pattern is matched. Returns the substring up to and including the end of the match, and advances the pointer. If there is no match, nil is returned and the pointer is not advanced.



134
135
136
137
138
139
140
141
# File 'lib/crass/scanner.rb', line 134

def scan_until(pattern)
  if match = @scanner.scan_until(pattern)
    @pos     += match.size
    @current  = match[-1]
  end

  match
end