Module: Scrapetor::Dom

Defined in:
lib/scrapetor/dom.rb,
lib/scrapetor/dom/parser.rb,
lib/scrapetor/dom/selectors.rb

Overview

Pure-Ruby DOM. Built from the SAX tokenizer. The backing tree for Scrapetor::Document when the native streaming extract path isn’t applicable (i.e. for ‘doc.css(…)`, `doc.at(…)`, mutation, and serialization).

This is intentionally minimal — node types are Element / Text / Comment / Doctype, plus a Document root. The CSS selector engine lives in ‘dom/selectors.rb`.

Defined Under Namespace

Modules: NodeMethods, Parser, Selectors Classes: AttrNode, Comment, Doctype, Document, Element, Text

Constant Summary collapse

VOID =
%w[
  area base br col embed hr img input link meta source track wbr
].freeze

Class Method Summary collapse

Class Method Details

.escape_attr(s) ⇒ Object



546
547
548
549
550
551
552
# File 'lib/scrapetor/dom.rb', line 546

def self.escape_attr(s)
  s.to_s.gsub(/[&<>"]/,
              "&" => "&amp;",
              "<" => "&lt;",
              ">" => "&gt;",
              '"' => "&quot;")
end

.escape_text(s) ⇒ Object

—– helpers —–



542
543
544
# File 'lib/scrapetor/dom.rb', line 542

def self.escape_text(s)
  s.to_s.gsub(/[&<>]/, "&" => "&amp;", "<" => "&lt;", ">" => "&gt;")
end

.normalize_replacement(input, parent:) ⇒ Object



554
555
556
557
558
559
560
561
# File 'lib/scrapetor/dom.rb', line 554

def self.normalize_replacement(input, parent:)
  case input
  when Element, Text, Comment, Doctype then [input]
  when Array                            then input
  when String                            then Dom::Parser.fragment(input)
  else                                       [Text.new(input.to_s, parent: parent)]
  end
end