Class: Crass::Scanner
- Inherits:
-
Object
- Object
- Crass::Scanner
- Defined in:
- lib/crass/scanner.rb
Overview
Similar to a StringScanner, but with extra functionality needed to tokenize CSS while preserving the original text.
Instance Attribute Summary collapse
-
#current ⇒ Object
readonly
Current character, or
nilif the scanner hasn't yet consumed a character, or is at the end of the string. -
#marker ⇒ Object
Current marker position.
-
#pos ⇒ Object
Position of the next character that will be consumed.
-
#string ⇒ Object
readonly
String being scanned.
Instance Method Summary collapse
-
#consume ⇒ Object
Consumes the next character and returns it, advancing the pointer, or an empty string if the end of the string has been reached.
-
#consume_rest ⇒ Object
Consumes the rest of the string and returns it, advancing the pointer to the end of the string.
-
#eos? ⇒ Boolean
Returns
trueif the end of the string has been reached,falseotherwise. -
#initialize(input) ⇒ Scanner
constructor
Creates a Scanner instance for the given input string or IO instance.
-
#mark ⇒ Object
Sets the marker to the position of the next character that will be consumed.
- #marked ⇒ Object
-
#peek(length = 1) ⇒ Object
Returns up to length characters starting at the current position, but doesn't consume them.
-
#reconsume ⇒ Object
Moves the pointer back one character without changing the value of #current.
-
#reset ⇒ Object
Resets the pointer to the beginning of the string.
-
#scan(pattern) ⇒ Object
Tries to match pattern at the current position.
-
#scan_until(pattern) ⇒ Object
Scans the string until the pattern is matched.
Constructor Details
#initialize(input) ⇒ Scanner
Creates a Scanner instance for the given input string or IO instance.
29 30 31 32 33 34 |
# File 'lib/crass/scanner.rb', line 29 def initialize(input) @string = input.is_a?(IO) ? input.read : input.to_s @scanner = StringScanner.new(@string) reset end |
Instance Attribute Details
#current ⇒ Object (readonly)
Current character, or nil if the scanner hasn't yet consumed a
character, or is at the end of the string.
11 12 13 |
# File 'lib/crass/scanner.rb', line 11 def current @current end |
#marker ⇒ Object
15 16 17 |
# File 'lib/crass/scanner.rb', line 15 def marker @marker end |
#pos ⇒ Object
Position of the next character that will be consumed. This is a character position, not a byte position, so it accounts for multi-byte characters.
Byte offsets (used internally for fast substring extraction) are tracked
separately by the underlying StringScanner, whose pos always reflects
the byte offset corresponding to this character position.
23 24 25 |
# File 'lib/crass/scanner.rb', line 23 def pos @pos end |
#string ⇒ Object (readonly)
String being scanned.
26 27 28 |
# File 'lib/crass/scanner.rb', line 26 def string @string end |
Instance Method Details
#consume ⇒ Object
Consumes the next character and returns it, advancing the pointer, or an empty string if the end of the string has been reached.
38 39 40 41 42 43 44 45 |
# File 'lib/crass/scanner.rb', line 38 def consume if @pos < @len @pos += 1 @current = @scanner.getch else '' end end |
#consume_rest ⇒ Object
Consumes the rest of the string and returns it, advancing the pointer to the end of the string. Returns an empty string is the end of the string has already been reached.
50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/crass/scanner.rb', line 50 def consume_rest result = @scanner.rest # `StringScanner#rest` does not advance the scan pointer, so move it to # the end of the input to keep the byte offset in sync with {#pos}. This # ensures a subsequent {#marked} extracts the correct substring. @scanner.terminate @current = result[-1] @pos = @len result end |
#eos? ⇒ Boolean
Returns true if the end of the string has been reached, false
otherwise.
66 67 68 |
# File 'lib/crass/scanner.rb', line 66 def eos? @pos == @len end |
#mark ⇒ Object
Sets the marker to the position of the next character that will be consumed.
72 73 74 75 |
# File 'lib/crass/scanner.rb', line 72 def mark @byte_marker = @scanner.pos @marker = @pos end |
#marked ⇒ Object
79 80 81 82 83 84 85 86 |
# File 'lib/crass/scanner.rb', line 79 def marked # Extract the marked text using byte offsets rather than character # offsets. Slicing the original string by character offset is O(n) on # multi-byte input (Ruby must translate the character index into a byte # index), which makes tokenizing non-ASCII input superlinear. Byte slicing # is O(length) regardless of how far into the string we are. @string.byteslice(@byte_marker, @scanner.pos - @byte_marker) || '' end |
#peek(length = 1) ⇒ Object
Returns up to length characters starting at the current position, but doesn't consume them. The number of characters returned may be less than length if the end of the string is reached.
91 92 93 94 95 96 97 98 |
# File 'lib/crass/scanner.rb', line 91 def peek(length = 1) # Grab the bytes for up to _length_ characters and then take the first # _length_ characters. A UTF-8 character is at most four bytes, so `length # * 4` bytes always contains at least _length_ whole characters when that # many remain. This avoids the O(n) character-offset slice that # `@string[pos, length]` would otherwise perform on multi-byte input. @string.byteslice(@scanner.pos, length * 4).slice(0, length) || '' end |
#reconsume ⇒ Object
103 104 105 106 |
# File 'lib/crass/scanner.rb', line 103 def reconsume @scanner.unscan @pos -= 1 if @pos > 0 end |
#reset ⇒ Object
Resets the pointer to the beginning of the string.
109 110 111 112 113 114 115 116 117 |
# File 'lib/crass/scanner.rb', line 109 def reset @scanner.reset @byte_marker = 0 @current = nil @len = @string.size @marker = 0 @pos = 0 end |
#scan(pattern) ⇒ Object
Tries to match pattern at the current position. If it matches, the
matched substring will be returned and the pointer will be advanced.
Otherwise, nil will be returned.
122 123 124 125 126 127 128 129 |
# File 'lib/crass/scanner.rb', line 122 def scan(pattern) if match = @scanner.scan(pattern) @pos += match.size @current = match[-1] end match end |
#scan_until(pattern) ⇒ Object
Scans the string until the pattern is matched. Returns the substring up
to and including the end of the match, and advances the pointer. If there
is no match, nil is returned and the pointer is not advanced.
134 135 136 137 138 139 140 141 |
# File 'lib/crass/scanner.rb', line 134 def scan_until(pattern) if match = @scanner.scan_until(pattern) @pos += match.size @current = match[-1] end match end |