Class: CSS::Tokenizer

Inherits:
Object
  • Object
show all
Includes:
CodePoints
Defined in:
lib/css/tokenizer.rb

Overview

Tokenizer based on CSS Syntax Module Level 3/4 §4. www.w3.org/TR/css-syntax-3/#tokenization

Not thread-safe: an instance carries mutable cursors (‘@pos`, `@newline_cursor`) that advance over the input. Allocate one tokenizer per thread.

Constant Summary collapse

PUNCTUATION =
{
  '(' => :lparen,
  ')' => :rparen,
  ',' => :comma,
  ':' => :colon,
  ';' => :semicolon,
  '[' => :lbracket,
  ']' => :rbracket,
  '{' => :lbrace,
  '}' => :rbrace
}.freeze
PREPROCESS_RE =

CR / FF (and CR LF) collapse to LF; NUL collapses to U+FFFD. Done in one pass.

/\r\n?|\f|\0/.freeze

Constants included from CodePoints

CodePoints::REPLACEMENT

Instance Method Summary collapse

Methods included from CodePoints

digit?, hex_digit?, ident_code_point?, ident_start_code_point?

Constructor Details

#initialize(input, preserve_comments: false) ⇒ Tokenizer

Returns a new instance of Tokenizer.



27
28
29
30
31
32
33
# File 'lib/css/tokenizer.rb', line 27

def initialize(input, preserve_comments: false)
  @chars             = preprocess(input)
  @pos               = 0
  @newlines          = collect_newline_offsets(@chars)
  @newline_cursor    = 0
  @preserve_comments = preserve_comments
end

Instance Method Details

#next_tokenObject



48
49
50
51
52
53
54
55
56
57
58
# File 'lib/css/tokenizer.rb', line 48

def next_token
  consume_comments unless @preserve_comments

  return Token.new(:eof) if @pos >= @chars.length

  start_offset = @pos
  tok          = consume_one_token
  line, column = line_column_at(start_offset)

  tok.assign_position!(Position.new(line:, column:, offset: start_offset, end_offset: @pos))
end

#tokenizeObject



35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/css/tokenizer.rb', line 35

def tokenize
  tokens = []

  loop do
    token = next_token
    break if token.type == :eof

    tokens << token
  end

  tokens
end