Class: Dommy::Internal::SelectorParser::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/dommy/internal/selector_parser.rb

Overview

Recursive-descent parser over a character buffer. Methods raise InvalidSelector on the first grammar violation.

Constant Summary collapse

WS =
" \t\r\n\f"
LEGACY_PSEUDO_ELEMENTS =

The four pseudo-elements that also accept the legacy one-colon syntax; written with ‘:` they are still pseudo-elements (match no element).

%w[before after first-line first-letter].to_set.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(string) ⇒ Parser

Returns a new instance of Parser.



94
95
96
97
98
99
# File 'lib/dommy/internal/selector_parser.rb', line 94

def initialize(string)
  @s = string
  @i = 0
  @n = string.length
  @clauses = []
end

Instance Attribute Details

#clausesObject (readonly)

Per top-level clause: { text:, pseudo_subject: } where pseudo_subject is true when the clause’s subject (rightmost compound) is a pseudo-element (‘::before`, `:first-line`) — such a clause matches no element.



92
93
94
# File 'lib/dommy/internal/selector_parser.rb', line 92

def clauses
  @clauses
end

Instance Method Details

#advanceObject



645
# File 'lib/dommy/internal/selector_parser.rb', line 645

def advance = @i += 1

#attribute_namespace_prefix_ahead?Boolean

Inside ‘[…]`, a namespace prefix precedes the attribute name. `*|` is any-namespace; a bare `|`; a named prefix is undeclared.

Returns:

  • (Boolean)


306
307
308
309
310
311
312
313
314
315
316
317
318
# File 'lib/dommy/internal/selector_parser.rb', line 306

def attribute_namespace_prefix_ahead?
  if peek == "*"
    return peek(1) == "|"
  end
  if peek == "|"
    return true
  end
  if ident_start?
    j = scan_ident_end(@i)
    return @s[j] == "|" && @s[j + 1] != "="
  end
  false
end

#combinator_char?(c) ⇒ Boolean

Returns:

  • (Boolean)


170
171
172
# File 'lib/dommy/internal/selector_parser.rb', line 170

def combinator_char?(c)
  c == ">" || c == "+" || c == "~" || (c == "|" && peek(1) == "|")
end

#consume_attr_flag!Object

The trailing case-sensitivity flag: a single i/I/s/S, then only WS or ].



344
345
346
347
348
349
# File 'lib/dommy/internal/selector_parser.rb', line 344

def consume_attr_flag!
  flag = peek
  fail!("invalid attribute flag") unless %w[i I s S].include?(flag)
  advance
  fail!("invalid attribute flag") unless eof? || WS.include?(peek) || peek == "]"
end

#consume_attr_matcher!Object



320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/dommy/internal/selector_parser.rb', line 320

def consume_attr_matcher!
  c = peek
  if "~|^$*".include?(c)
    advance
    fail!("invalid attribute matcher") unless peek == "="
    advance
  elsif c == "="
    advance
  else
    fail!("invalid attribute selector")
  end
end

#consume_attr_value!Object



333
334
335
336
337
338
339
340
341
# File 'lib/dommy/internal/selector_parser.rb', line 333

def consume_attr_value!
  if peek == '"' || peek == "'"
    consume_string!
  elsif ident_start?
    consume_ident!
  else
    fail!("invalid attribute value")
  end
end

#consume_balanced_until_close!Object

Consume a balanced run up to the matching ‘)’ (for pseudo functions we don’t model). Nested ()/[] are balanced; the run must be non-empty.



488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
# File 'lib/dommy/internal/selector_parser.rb', line 488

def consume_balanced_until_close!
  depth = 0
  started = false
  until eof?
    c = peek
    break if c == ")" && depth.zero?

    started = true
    if c == "(" || c == "["
      depth += 1
    elsif c == ")" || c == "]"
      depth -= 1
    elsif c == '"' || c == "'"
      consume_string!
      next
    end
    advance
  end
  fail!("empty function arguments") unless started
end

#consume_combinator!Object

One explicit combinator token: > , + , ~ , >> , || .



152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/dommy/internal/selector_parser.rb', line 152

def consume_combinator!
  c = peek
  case c
  when ">"
    advance
    advance if peek == ">" # legacy descendant `>>`
  when "+", "~"
    advance
    fail!("invalid combinator") if peek == c # `++`, `~~`
  when "|"
    fail!("invalid combinator") unless peek(1) == "|"
    advance
    advance
  else
    fail!("invalid combinator #{c.inspect}")
  end
end

#consume_escape!Object



574
575
576
577
578
# File 'lib/dommy/internal/selector_parser.rb', line 574

def consume_escape!
  advance # backslash
  fail!("trailing backslash") if eof?
  advance # at least one char follows
end

#consume_function_args!(name, pseudo_element:) ⇒ Object

Validate ‘name(…)` per the function’s argument grammar.



393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
# File 'lib/dommy/internal/selector_parser.rb', line 393

def consume_function_args!(name, pseudo_element:)
  advance # consume '('
  skip_ws
  if pseudo_element
    # ::slotted(<compound>), ::part(<ident>+), ::cue(<selector>), …
    case name
    when "slotted" then parse_complex_selector!
    when "part" then consume_ident_sequence!
    else parse_complex_selector!
    end
  elsif SELECTOR_LIST_FUNCTIONS.include?(name)
    parse_inner_selector_list!
  elsif NTH_FUNCTIONS.include?(name)
    consume_nth!
  elsif IDENT_FUNCTIONS.include?(name)
    consume_ident_sequence!
  elsif NESTED_SELECTOR_FUNCTIONS.include?(name)
    parse_inner_selector_list!
  elsif KNOWN_PSEUDOS.include?(name)
    # A known pseudo used functionally we don't model the args of — accept
    # a balanced, non-empty argument run.
    consume_balanced_until_close!
  else
    fail!("unknown functional pseudo-class '#{name}'")
  end
  skip_ws
  # EOF implicitly closes an open `(` (`::slotted(foo`), like the `[`
  # case above.
  return if eof?

  fail!("unclosed pseudo-class function") unless peek == ")"
  advance
end

#consume_ident!Object

Consume an identifier (assumes ident_start?). Returns the text.



534
535
536
537
538
539
540
541
542
543
544
545
546
547
# File 'lib/dommy/internal/selector_parser.rb', line 534

def consume_ident!
  start = @i
  # leading hyphen(s)
  advance if peek == "-"
  if peek == "\\"
    consume_escape!
  elsif ident_letter?(peek)
    advance
  else
    fail!("invalid identifier")
  end
  consume_name_rest!
  @s[start...@i]
end

#consume_ident_sequence!Object



473
474
475
476
477
478
479
480
481
482
483
484
# File 'lib/dommy/internal/selector_parser.rb', line 473

def consume_ident_sequence!
  fail!("expected identifier") unless ident_start?
  consume_ident!
  loop do
    skip_ws
    break unless ident_start? || peek == ","

    advance if peek == ","
    skip_ws
    consume_ident! if ident_start?
  end
end

#consume_name!Object

Consume a name (id token body): like an ident but may start with a digit / hyphen sequence.



551
552
553
554
555
# File 'lib/dommy/internal/selector_parser.rb', line 551

def consume_name!
  start = @i
  consume_name_rest!(require_one: true)
  @s[start...@i]
end

#consume_name_rest!(require_one: false) ⇒ Object



557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
# File 'lib/dommy/internal/selector_parser.rb', line 557

def consume_name_rest!(require_one: false)
  count = 0
  loop do
    c = peek
    if c == "\\"
      consume_escape!
      count += 1
    elsif name_char?(c)
      advance
      count += 1
    else
      break
    end
  end
  fail!("empty name") if require_one && count.zero?
end

#consume_nth!Object

An+B microsyntax (‘2n`, `-3n+1`, `odd`, `even`, `5`, `n`).



444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
# File 'lib/dommy/internal/selector_parser.rb', line 444

def consume_nth!
  word = peek_word.downcase
  if word == "odd" || word == "even"
    consume_ident!
    return
  end
  consumed = false
  if peek == "+" || peek == "-"
    advance
    consumed = true
  end
  while digit?(peek)
    advance
    consumed = true
  end
  if peek == "n" || peek == "N"
    advance
    consumed = true
    skip_ws
    if peek == "+" || peek == "-"
      advance
      skip_ws
      fail!("invalid An+B") unless digit?(peek)
      advance while digit?(peek)
    end
  end
  fail!("invalid An+B expression") unless consumed
end

#consume_string!Object

—- token helpers ————————————————-



511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
# File 'lib/dommy/internal/selector_parser.rb', line 511

def consume_string!
  quote = peek
  advance
  until eof?
    c = peek
    if c == "\\"
      advance
      advance unless eof?
      next
    elsif c == quote
      advance
      return
    elsif c == "\n"
      fail!("newline in string")
    end
    advance
  end
  # EOF implicitly closes an open string (CSS tokenizing); only a raw
  # newline inside a string is a parse error.
  nil
end

#digit?(c) ⇒ Boolean

Returns:

  • (Boolean)


617
# File 'lib/dommy/internal/selector_parser.rb', line 617

def digit?(c) = !c.nil? && c >= "0" && c <= "9"

#eof?(offset = 0) ⇒ Boolean

Returns:

  • (Boolean)


656
# File 'lib/dommy/internal/selector_parser.rb', line 656

def eof?(offset = 0) = (@i + offset) >= @n

#fail!(message) ⇒ Object

Raises:



658
659
660
# File 'lib/dommy/internal/selector_parser.rb', line 658

def fail!(message)
  raise InvalidSelector, message
end

#fail_undeclared_namespace!Object

Raises:



261
262
263
# File 'lib/dommy/internal/selector_parser.rb', line 261

def fail_undeclared_namespace!
  raise InvalidSelector, "undeclared namespace"
end

#ident_letter?(c) ⇒ Boolean

A letter, underscore, or non-ASCII (>= U+0080) start char.

Returns:

  • (Boolean)


596
597
598
599
600
# File 'lib/dommy/internal/selector_parser.rb', line 596

def ident_letter?(c)
  return false if c.nil?

  c.match?(/[A-Za-z_]/) || c.ord >= 0x80
end

#ident_start?Boolean

—- character classification ————————————–

Returns:

  • (Boolean)


582
583
584
585
586
587
588
589
590
591
592
593
# File 'lib/dommy/internal/selector_parser.rb', line 582

def ident_start?
  c = peek
  return false if c.nil?
  return true if ident_letter?(c)
  return true if c == "\\" && !eof?(1)
  # leading '-' is an ident start if followed by ident-letter / '-' / esc
  if c == "-"
    nxt = peek(1)
    return !nxt.nil? && (ident_letter?(nxt) || nxt == "-" || nxt == "\\")
  end
  false
end

#name_char?(c) ⇒ Boolean

Returns:

  • (Boolean)


602
603
604
605
606
# File 'lib/dommy/internal/selector_parser.rb', line 602

def name_char?(c)
  return false if c.nil?

  c.match?(/[A-Za-z0-9_\-]/) || c.ord >= 0x80
end

#name_char_start?(allow_leading_digit: false) ⇒ Boolean

Returns:

  • (Boolean)


608
609
610
611
612
613
614
615
# File 'lib/dommy/internal/selector_parser.rb', line 608

def name_char_start?(allow_leading_digit: false)
  c = peek
  return false if c.nil?
  return true if c == "\\" && !eof?(1)
  return true if name_char?(c) && (allow_leading_digit || !digit?(c))

  false
end

#namespace_prefix_ahead?Boolean

Is there a namespace prefix (‘*|`, `|`, `ident|`) at the cursor, as distinct from a `||` column combinator?

Returns:

  • (Boolean)


230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/dommy/internal/selector_parser.rb', line 230

def namespace_prefix_ahead?
  if peek == "*"
    return peek(1) == "|" && peek(2) != "|"
  end
  if peek == "|"
    return peek(1) != "|"
  end
  if ident_start?
    # Scan the ident, then check for a single '|' (not '||').
    j = scan_ident_end(@i)
    return @s[j] == "|" && @s[j + 1] != "|"
  end
  false
end

#parse_attribute!Object

attribute := ‘[’ WS? [<ns-prefix>]? <ident> WS?

( <matcher> WS? (<ident> | <string>) WS? <flag>? WS? )? ']'


281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# File 'lib/dommy/internal/selector_parser.rb', line 281

def parse_attribute!
  advance # consume '['
  skip_ws
  parse_namespace_prefix! if attribute_namespace_prefix_ahead?
  fail!("invalid attribute name") unless ident_start?
  consume_ident!
  skip_ws
  unless peek == "]"
    consume_attr_matcher!
    skip_ws
    consume_attr_value!
    skip_ws
    consume_attr_flag! if ident_start?
    skip_ws
  end
  # Per CSS tokenizing, EOF implicitly closes an open `[` — so a trailing
  # unclosed attribute selector (`[align="center"`) is still valid.
  return if eof?

  fail!("unclosed attribute selector") unless peek == "]"
  advance
end

#parse_class!Object

class := ‘.’ <ident>



273
274
275
276
277
# File 'lib/dommy/internal/selector_parser.rb', line 273

def parse_class!
  advance # consume '.'
  fail!("invalid class") unless ident_start?
  consume_ident!
end

#parse_complex_selector!Object

complex := <compound> ( <combinator> <compound> )* combinator is one of > + ~ >> || or descendant (whitespace). Returns whether the SUBJECT (last) compound is a pseudo-element.



128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/dommy/internal/selector_parser.rb', line 128

def parse_complex_selector!
  pseudo_subject = parse_compound_selector!
  loop do
    had_ws = skip_ws
    # `)` ends a complex selector nested in a functional pseudo
    # (`:not(div)`); `,` / EOF end one at the top level.
    break if eof? || peek == "," || peek == ")"

    if combinator_char?(peek)
      consume_combinator!
      skip_ws
      fail!("dangling combinator") if eof? || peek == "," || combinator_char?(peek)
      pseudo_subject = parse_compound_selector!
    elsif had_ws
      # Descendant combinator (whitespace) — next must be a compound.
      pseudo_subject = parse_compound_selector!
    else
      fail!("unexpected #{peek.inspect}")
    end
  end
  pseudo_subject
end

#parse_compound_selector!Object

compound := [ <type> | <universal> ]? <subclass>* with at least one simple selector. A type/universal, if present, comes first. Returns whether the compound includes a pseudo-element (always the last token).



177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/dommy/internal/selector_parser.rb', line 177

def parse_compound_selector!
  saw_any = false
  pseudo_element = false
  # Optional leading type/universal (may carry a namespace prefix).
  if type_start?
    parse_type_or_universal!
    saw_any = true
  end
  loop do
    case peek
    when "#"
      parse_id!
    when "."
      parse_class!
    when "["
      parse_attribute!
    when ":"
      pseudo_element = parse_pseudo!
    else
      break
    end
    saw_any = true
  end
  fail!("empty compound selector") unless saw_any

  pseudo_element
end

#parse_id!Object

id := ‘#’ <name> (a hash token; ‘#` alone or `#` + non-name invalid)



266
267
268
269
270
# File 'lib/dommy/internal/selector_parser.rb', line 266

def parse_id!
  advance # consume '#'
  fail!("invalid id") unless name_char_start?(allow_leading_digit: true)
  consume_name!
end

#parse_inner_selector_list!Object

A selector list inside :not()/:is()/:where()/:has() — ‘:has` allows a leading combinator (relative selector), the others do not.



429
430
431
432
433
434
435
436
437
438
439
440
441
# File 'lib/dommy/internal/selector_parser.rb', line 429

def parse_inner_selector_list!
  skip_ws
  consume_combinator! if combinator_char?(peek) # tolerate relative selectors
  skip_ws
  parse_complex_selector!
  while peek == ","
    advance
    skip_ws
    consume_combinator! if combinator_char?(peek)
    skip_ws
    parse_complex_selector!
  end
end

#parse_namespace_prefix!Object

ns-prefix := (<ident> | ‘*’)? ‘|’ — any named prefix is undeclared.



246
247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/dommy/internal/selector_parser.rb', line 246

def parse_namespace_prefix!
  if peek == "*"
    advance
  elsif peek == "|"
    # empty (no-namespace) prefix
  elsif ident_start?
    consume_ident!
    fail_undeclared_namespace!
  else
    fail!("invalid namespace prefix")
  end
  fail!("expected '|' in namespace prefix") unless peek == "|"
  advance
end

#parse_pseudo!Object

pseudo := ‘::’ <pseudo-element> | ‘:’ (<pseudo-class> | <function>). Returns true when this is a pseudo-element (so a compound ending here matches no element).



358
359
360
361
362
363
364
365
366
367
# File 'lib/dommy/internal/selector_parser.rb', line 358

def parse_pseudo!
  advance # first ':'
  if peek == ":"
    advance # pseudo-element '::'
    parse_pseudo_element!
    true
  else
    parse_pseudo_class!
  end
end

#parse_pseudo_class!Object

Returns true when the ‘:name` is actually a legacy pseudo-element.



380
381
382
383
384
385
386
387
388
389
390
# File 'lib/dommy/internal/selector_parser.rb', line 380

def parse_pseudo_class!
  fail!("invalid pseudo-class") unless ident_start?
  name = consume_ident!.downcase
  if peek == "("
    consume_function_args!(name, pseudo_element: false)
    false
  else
    fail!("unknown pseudo-class '#{name}'") unless KNOWN_PSEUDOS.include?(name)
    LEGACY_PSEUDO_ELEMENTS.include?(name)
  end
end

#parse_pseudo_element!Object



369
370
371
372
373
374
375
376
377
# File 'lib/dommy/internal/selector_parser.rb', line 369

def parse_pseudo_element!
  fail!("invalid pseudo-element") unless ident_start?
  name = consume_ident!.downcase
  if peek == "("
    consume_function_args!(name, pseudo_element: true)
  else
    fail!("unknown pseudo-element '#{name}'") unless KNOWN_PSEUDO_ELEMENTS.include?(name)
  end
end

#parse_selector_list!Object

selector-list := <complex-selector> (‘,’ <complex-selector>)* with optional surrounding whitespace; an empty list or an empty element (leading/trailing/double comma) is invalid.



104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/dommy/internal/selector_parser.rb', line 104

def parse_selector_list!
  skip_ws
  fail!("empty selector") if eof?
  record_clause { parse_complex_selector! }
  while peek == ","
    advance
    skip_ws
    fail!("empty selector in list") if eof? || peek == ","
    record_clause { parse_complex_selector! }
  end
  skip_ws
  fail!("unexpected #{peek.inspect}") unless eof?
end

#parse_type_or_universal!Object

type := [<ns-prefix>]? (<ident> | ‘*’)



217
218
219
220
221
222
223
224
225
226
# File 'lib/dommy/internal/selector_parser.rb', line 217

def parse_type_or_universal!
  parse_namespace_prefix! if namespace_prefix_ahead?
  if peek == "*"
    advance
  elsif ident_start?
    consume_ident!
  else
    fail!("expected type selector")
  end
end

#peek(offset = 0) ⇒ Object

—- cursor ——————————————————–



637
# File 'lib/dommy/internal/selector_parser.rb', line 637

def peek(offset = 0) = @s[@i + offset]

#peek_wordObject



639
640
641
642
643
# File 'lib/dommy/internal/selector_parser.rb', line 639

def peek_word
  j = @i
  j += 1 while j < @n && @s[j].match?(/[A-Za-z]/)
  @s[@i...j]
end

#record_clauseObject

Capture a clause’s source text + whether its subject is a pseudo-element.



119
120
121
122
123
# File 'lib/dommy/internal/selector_parser.rb', line 119

def record_clause
  start = @i
  pseudo_subject = yield
  @clauses << {text: @s[start...@i].strip, pseudo_subject: pseudo_subject}
end

#scan_ident_end(from) ⇒ Object

Index just past the identifier starting at ‘from` (no validation).



620
621
622
623
624
625
626
627
628
629
630
631
632
633
# File 'lib/dommy/internal/selector_parser.rb', line 620

def scan_ident_end(from)
  j = from
  j += 1 if @s[j] == "-"
  while (ch = @s[j])
    if ch == "\\"
      j += 2
    elsif ch.match?(/[A-Za-z0-9_\-]/) || ch.ord >= 0x80
      j += 1
    else
      break
    end
  end
  j
end

#skip_wsObject



647
648
649
650
651
652
653
654
# File 'lib/dommy/internal/selector_parser.rb', line 647

def skip_ws
  moved = false
  while !eof? && WS.include?(peek)
    advance
    moved = true
  end
  moved
end

#type_start?Boolean

A compound may start with a type/universal selector when the next token is an ident, ‘*`, or a namespace prefix (`*|`, `|`, `ident|`).

Returns:

  • (Boolean)


207
208
209
210
211
212
213
214
# File 'lib/dommy/internal/selector_parser.rb', line 207

def type_start?
  c = peek
  return true if c == "*"
  return true if c == "|"
  return true if ident_start?

  false
end