Class: CSS::Tokenizer

Inherits:
Object
  • Object
show all
Includes:
CodePoints
Defined in:
lib/css/tokenizer.rb

Overview

Tokenizer based on CSS Syntax Module Level 3/4 §4. www.w3.org/TR/css-syntax-3/#tokenization

Not thread-safe: an instance carries a mutable cursor (‘@pos`) that advances over the input. Allocate one tokenizer per thread.

Constant Summary collapse

PUNCTUATION =
{
  '(' => :lparen,
  ')' => :rparen,
  ',' => :comma,
  ':' => :colon,
  ';' => :semicolon,
  '[' => :lbracket,
  ']' => :rbracket,
  '{' => :lbrace,
  '}' => :rbrace
}.freeze
PREPROCESS_RE =

CR / FF (and CR LF) collapse to LF; NUL collapses to U+FFFD. Done in one pass.

/\r\n?|\f|\0/.freeze

Constants included from CodePoints

CodePoints::DIGIT_TABLE, CodePoints::HEX_DIGIT_TABLE, CodePoints::IDENT_CP_TABLE, CodePoints::IDENT_START_TABLE, CodePoints::REPLACEMENT

Instance Method Summary collapse

Methods included from CodePoints

build_table, digit?, hex_digit?, ident_code_point?, ident_start_code_point?

Constructor Details

#initialize(input, preserve_comments: false) ⇒ Tokenizer

Returns a new instance of Tokenizer.



26
27
28
29
30
31
32
# File 'lib/css/tokenizer.rb', line 26

def initialize(input, preserve_comments: false)
  @chars             = preprocess(input)
  @length            = @chars.length
  @pos               = 0
  @newlines          = collect_newline_offsets(@chars)
  @preserve_comments = preserve_comments
end

Instance Method Details

#next_tokenObject



47
48
49
50
51
52
53
54
55
56
# File 'lib/css/tokenizer.rb', line 47

def next_token
  consume_comments unless @preserve_comments

  return Token.new(:eof) if @pos >= @length

  start_offset = @pos
  tok          = consume_one_token

  tok.assign_source!(start_offset, @pos, @newlines)
end

#tokenizeObject



34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/css/tokenizer.rb', line 34

def tokenize
  tokens = []

  loop do
    token = next_token
    break if token.type == :eof

    tokens << token
  end

  tokens
end