Class: CSS::Tokenizer
- Inherits:
-
Object
- Object
- CSS::Tokenizer
- Includes:
- CodePoints
- Defined in:
- lib/css/tokenizer.rb
Overview
Tokenizer based on CSS Syntax Module Level 3/4 §4. www.w3.org/TR/css-syntax-3/#tokenization
Not thread-safe: an instance carries a mutable cursor (‘@pos`) that advances over the input. Allocate one tokenizer per thread.
Constant Summary collapse
- PUNCTUATION =
{ '(' => :lparen, ')' => :rparen, ',' => :comma, ':' => :colon, ';' => :semicolon, '[' => :lbracket, ']' => :rbracket, '{' => :lbrace, '}' => :rbrace }.freeze
- PREPROCESS_RE =
CR / FF (and CR LF) collapse to LF; NUL collapses to U+FFFD. Done in one pass.
/\r\n?|\f|\0/.freeze
Constants included from CodePoints
CodePoints::DIGIT_TABLE, CodePoints::HEX_DIGIT_TABLE, CodePoints::IDENT_CP_TABLE, CodePoints::IDENT_START_TABLE, CodePoints::REPLACEMENT
Instance Method Summary collapse
-
#initialize(input, preserve_comments: false) ⇒ Tokenizer
constructor
A new instance of Tokenizer.
- #next_token ⇒ Object
- #tokenize ⇒ Object
Methods included from CodePoints
build_table, digit?, hex_digit?, ident_code_point?, ident_start_code_point?
Constructor Details
#initialize(input, preserve_comments: false) ⇒ Tokenizer
Returns a new instance of Tokenizer.
26 27 28 29 30 31 32 |
# File 'lib/css/tokenizer.rb', line 26 def initialize(input, preserve_comments: false) @chars = preprocess(input) @length = @chars.length @pos = 0 @newlines = collect_newline_offsets(@chars) @preserve_comments = preserve_comments end |
Instance Method Details
#next_token ⇒ Object
47 48 49 50 51 52 53 54 55 56 |
# File 'lib/css/tokenizer.rb', line 47 def next_token consume_comments unless @preserve_comments return Token.new(:eof) if @pos >= @length start_offset = @pos tok = consume_one_token tok.assign_source!(start_offset, @pos, @newlines) end |
#tokenize ⇒ Object
34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/css/tokenizer.rb', line 34 def tokenize tokens = [] loop do token = next_token break if token.type == :eof tokens << token end tokens end |