Module: Scrapetor::Dom
- Defined in:
- lib/scrapetor/dom.rb,
lib/scrapetor/dom/parser.rb,
lib/scrapetor/dom/selectors.rb
Overview
Pure-Ruby DOM. Built from the SAX tokenizer. The backing tree for Scrapetor::Document when the native streaming extract path isn’t applicable (i.e. for ‘doc.css(…)`, `doc.at(…)`, mutation, and serialization).
This is intentionally minimal — node types are Element / Text / Comment / Doctype, plus a Document root. The CSS selector engine lives in ‘dom/selectors.rb`.
Defined Under Namespace
Modules: NodeMethods, Parser, Selectors Classes: AttrNode, Comment, Doctype, Document, Element, Text
Constant Summary collapse
- VOID =
%w[ area base br col embed hr img input link meta source track wbr ].freeze
Class Method Summary collapse
- .escape_attr(s) ⇒ Object
-
.escape_text(s) ⇒ Object
—– helpers —–.
- .normalize_replacement(input, parent:) ⇒ Object
Class Method Details
.escape_attr(s) ⇒ Object
546 547 548 549 550 551 552 |
# File 'lib/scrapetor/dom.rb', line 546 def self.escape_attr(s) s.to_s.gsub(/[&<>"]/, "&" => "&", "<" => "<", ">" => ">", '"' => """) end |
.escape_text(s) ⇒ Object
—– helpers —–
542 543 544 |
# File 'lib/scrapetor/dom.rb', line 542 def self.escape_text(s) s.to_s.gsub(/[&<>]/, "&" => "&", "<" => "<", ">" => ">") end |
.normalize_replacement(input, parent:) ⇒ Object
554 555 556 557 558 559 560 561 |
# File 'lib/scrapetor/dom.rb', line 554 def self.normalize_replacement(input, parent:) case input when Element, Text, Comment, Doctype then [input] when Array then input when String then Dom::Parser.fragment(input) else [Text.new(input.to_s, parent: parent)] end end |