Module: Canon::Comparison::NodeInspector

Defined in:
lib/canon/comparison/node_inspector.rb

Overview

Single source of truth for cross-backend node type operations.

The comparison pipeline handles nodes from two backends:

  • Canon::Xml::Node (+ RootNode, ElementNode, TextNode, etc.) —custom DOM built by SAX builder and DataModel.

  • Nokogiri::XML::Node (+ subclasses) — native Nokogiri nodes used by the HTML comparator and some legacy paths.

Every method here dispatches on type via case/when (is_a?). No respond_to? — the types are known at every call site.

Constant Summary collapse

CANON_TEXT_TYPE =
:text
NOKOGIRI_TEXT_TYPE =
defined?(Nokogiri::XML::Node::TEXT_NODE) ? Nokogiri::XML::Node::TEXT_NODE : 3

Class Method Summary collapse

Class Method Details

.comment_node?(node) ⇒ Boolean

True when node is a comment node. For HTML, also detects comments that Nokogiri parses as TEXT nodes (content like “<!– comment –>” or escaped “<\!– comment –>”).

Returns:

  • (Boolean)


56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/canon/comparison/node_inspector.rb', line 56

def self.comment_node?(node)
  case node
  when Canon::Xml::Node
    node.node_type == :comment
  when Nokogiri::XML::Node
    return true if node.comment?

    # HTML comments are parsed as TEXT nodes by Nokogiri
    if node.text?
      text_stripped = text_content(node).to_s.strip.gsub("\\", "")
      return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
    end
    false
  else
    false
  end
end

.element_node?(node) ⇒ Boolean

True when node is an element node.

Returns:

  • (Boolean)


75
76
77
78
79
80
81
82
83
84
# File 'lib/canon/comparison/node_inspector.rb', line 75

def self.element_node?(node)
  case node
  when Canon::Xml::Node
    node.node_type == :element
  when Nokogiri::XML::Node
    node.element?
  else
    false
  end
end

.noise_dimension_for(node) ⇒ Symbol?

Classify node as a noise node and return the diff dimension it should be reported under, or nil if it is structural content.

Noise nodes (whitespace-only text, comments) are realigned past during child comparison so that content nodes line up correctly across sides.

Parameters:

  • node (Object)

    DOM node to classify

Returns:

  • (Symbol, nil)

    :whitespace_adjacency, :comments, or nil



95
96
97
98
99
100
101
# File 'lib/canon/comparison/node_inspector.rb', line 95

def self.noise_dimension_for(node)
  if whitespace_only_text?(node)
    :whitespace_adjacency
  elsif comment_node?(node)
    :comments
  end
end

.noise_node?(node) ⇒ Boolean

True when node is a noise node (whitespace-only text or comment). Convenience wrapper around noise_dimension_for.

Parameters:

  • node (Object)

    DOM node to check

Returns:

  • (Boolean)


108
109
110
# File 'lib/canon/comparison/node_inspector.rb', line 108

def self.noise_node?(node)
  !noise_dimension_for(node).nil?
end

.parent_of(node) ⇒ Object

Return the parent node of node, or nil when node is not a recognised DOM backend type or has no parent.



130
131
132
133
134
135
# File 'lib/canon/comparison/node_inspector.rb', line 130

def self.parent_of(node)
  case node
  when Canon::Xml::Node, Nokogiri::XML::Node
    node.parent
  end
end

.parse_errors(node) ⇒ Object

Extract parse-time errors carried on a node or its owning document. Returns an Array of Strings.



114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/canon/comparison/node_inspector.rb', line 114

def self.parse_errors(node)
  case node
  when nil
    []
  when Canon::Xml::Node
    errors = node.parse_errors
    Array(errors).map(&:to_s)
  when Nokogiri::XML::Document, Nokogiri::HTML5::Document
    Array(node.errors).map(&:to_s)
  else
    []
  end
end

.text_content(node) ⇒ Object

Extract the text content of node as a String.



32
33
34
35
36
37
38
39
40
41
# File 'lib/canon/comparison/node_inspector.rb', line 32

def self.text_content(node)
  case node
  when Canon::Xml::Node
    node.value.to_s
  when Nokogiri::XML::Node
    node.content.to_s
  else
    node.to_s
  end
end

.text_node?(node) ⇒ Boolean

True when node is a text node (whitespace, content, etc.).

Returns:

  • (Boolean)


20
21
22
23
24
25
26
27
28
29
# File 'lib/canon/comparison/node_inspector.rb', line 20

def self.text_node?(node)
  case node
  when Canon::Xml::Node
    node.node_type == CANON_TEXT_TYPE
  when Nokogiri::XML::Node
    node.node_type == NOKOGIRI_TEXT_TYPE
  else
    false
  end
end

.whitespace_only_text?(node) ⇒ Boolean

True when node is a text node whose content is whitespace-only. Empty-string text nodes return false — those represent genuine empty-vs-content asymmetry, not pretty-print indentation.

Returns:

  • (Boolean)


46
47
48
49
50
51
# File 'lib/canon/comparison/node_inspector.rb', line 46

def self.whitespace_only_text?(node)
  return false unless text_node?(node)

  text = text_content(node)
  !text.empty? && text.strip.empty?
end