Module: Canon::Comparison::NodeInspector
- Defined in:
- lib/canon/comparison/node_inspector.rb
Overview
Single source of truth for cross-backend node type operations.
The comparison pipeline handles nodes from two backends:
-
Canon::Xml::Node (+ RootNode, ElementNode, TextNode, etc.) —custom DOM built by SAX builder and DataModel.
-
Nokogiri::XML::Node (+ subclasses) — native Nokogiri nodes used by the HTML comparator and some legacy paths.
Every method here dispatches on type via case/when (is_a?). No respond_to? — the types are known at every call site.
Constant Summary collapse
- CANON_TEXT_TYPE =
:text- NOKOGIRI_TEXT_TYPE =
defined?(Nokogiri::XML::Node::TEXT_NODE) ? Nokogiri::XML::Node::TEXT_NODE : 3
Class Method Summary collapse
-
.comment_node?(node) ⇒ Boolean
True when
nodeis a comment node. -
.element_node?(node) ⇒ Boolean
True when
nodeis an element node. -
.noise_dimension_for(node) ⇒ Symbol?
Classify
nodeas a noise node and return the diff dimension it should be reported under, ornilif it is structural content. -
.noise_node?(node) ⇒ Boolean
True when
nodeis a noise node (whitespace-only text or comment). -
.parent_of(node) ⇒ Object
Return the parent node of
node, or nil whennodeis not a recognised DOM backend type or has no parent. -
.parse_errors(node) ⇒ Object
Extract parse-time errors carried on a node or its owning document.
-
.text_content(node) ⇒ Object
Extract the text content of
nodeas a String. -
.text_node?(node) ⇒ Boolean
True when
nodeis a text node (whitespace, content, etc.). -
.whitespace_only_text?(node) ⇒ Boolean
True when
nodeis a text node whose content is whitespace-only.
Class Method Details
.comment_node?(node) ⇒ Boolean
True when node is a comment node. For HTML, also detects comments that Nokogiri parses as TEXT nodes (content like “<!– comment –>” or escaped “<\!– comment –>”).
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/canon/comparison/node_inspector.rb', line 56 def self.comment_node?(node) case node when Canon::Xml::Node node.node_type == :comment when Nokogiri::XML::Node return true if node.comment? # HTML comments are parsed as TEXT nodes by Nokogiri if node.text? text_stripped = text_content(node).to_s.strip.gsub("\\", "") return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->") end false else false end end |
.element_node?(node) ⇒ Boolean
True when node is an element node.
75 76 77 78 79 80 81 82 83 84 |
# File 'lib/canon/comparison/node_inspector.rb', line 75 def self.element_node?(node) case node when Canon::Xml::Node node.node_type == :element when Nokogiri::XML::Node node.element? else false end end |
.noise_dimension_for(node) ⇒ Symbol?
Classify node as a noise node and return the diff dimension it should be reported under, or nil if it is structural content.
Noise nodes (whitespace-only text, comments) are realigned past during child comparison so that content nodes line up correctly across sides.
95 96 97 98 99 100 101 |
# File 'lib/canon/comparison/node_inspector.rb', line 95 def self.noise_dimension_for(node) if whitespace_only_text?(node) :whitespace_adjacency elsif comment_node?(node) :comments end end |
.noise_node?(node) ⇒ Boolean
True when node is a noise node (whitespace-only text or comment). Convenience wrapper around noise_dimension_for.
108 109 110 |
# File 'lib/canon/comparison/node_inspector.rb', line 108 def self.noise_node?(node) !noise_dimension_for(node).nil? end |
.parent_of(node) ⇒ Object
Return the parent node of node, or nil when node is not a recognised DOM backend type or has no parent.
130 131 132 133 134 135 |
# File 'lib/canon/comparison/node_inspector.rb', line 130 def self.parent_of(node) case node when Canon::Xml::Node, Nokogiri::XML::Node node.parent end end |
.parse_errors(node) ⇒ Object
Extract parse-time errors carried on a node or its owning document. Returns an Array of Strings.
114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/canon/comparison/node_inspector.rb', line 114 def self.parse_errors(node) case node when nil [] when Canon::Xml::Node errors = node.parse_errors Array(errors).map(&:to_s) when Nokogiri::XML::Document, Nokogiri::HTML5::Document Array(node.errors).map(&:to_s) else [] end end |
.text_content(node) ⇒ Object
Extract the text content of node as a String.
32 33 34 35 36 37 38 39 40 41 |
# File 'lib/canon/comparison/node_inspector.rb', line 32 def self.text_content(node) case node when Canon::Xml::Node node.value.to_s when Nokogiri::XML::Node node.content.to_s else node.to_s end end |
.text_node?(node) ⇒ Boolean
True when node is a text node (whitespace, content, etc.).
20 21 22 23 24 25 26 27 28 29 |
# File 'lib/canon/comparison/node_inspector.rb', line 20 def self.text_node?(node) case node when Canon::Xml::Node node.node_type == CANON_TEXT_TYPE when Nokogiri::XML::Node node.node_type == NOKOGIRI_TEXT_TYPE else false end end |
.whitespace_only_text?(node) ⇒ Boolean
True when node is a text node whose content is whitespace-only. Empty-string text nodes return false — those represent genuine empty-vs-content asymmetry, not pretty-print indentation.
46 47 48 49 50 51 |
# File 'lib/canon/comparison/node_inspector.rb', line 46 def self.whitespace_only_text?(node) return false unless text_node?(node) text = text_content(node) !text.empty? && text.strip.empty? end |