Module: Canon::Comparison::NodeInspector

Defined in:
lib/canon/comparison/node_inspector.rb

Overview

Single source of truth for cross-backend node type operations.

The comparison pipeline handles nodes from two backends:

  • Canon::Xml::Node (+ RootNode, ElementNode, TextNode, etc.) —custom DOM built by SAX builder and DataModel.

  • Nokogiri::XML::Node (+ subclasses) — native Nokogiri nodes used by the HTML comparator and some legacy paths.

Every method here dispatches on type via case/when (is_a?). No respond_to? — the types are known at every call site.

Constant Summary collapse

CANON_TEXT_TYPE =
:text
NOKOGIRI_TEXT_TYPE =
defined?(Nokogiri::XML::Node::TEXT_NODE) ? Nokogiri::XML::Node::TEXT_NODE : 3

Class Method Summary collapse

Class Method Details

.comment_node?(node) ⇒ Boolean

True when node is a comment node. For HTML, also detects comments that Nokogiri parses as TEXT nodes (content like “<!– comment –>” or escaped “<\!– comment –>”).

Returns:

  • (Boolean)


56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/canon/comparison/node_inspector.rb', line 56

def self.comment_node?(node)
  case node
  when Canon::Xml::Node
    node.node_type == :comment
  when Nokogiri::XML::Node
    return true if node.comment?

    # HTML comments are parsed as TEXT nodes by Nokogiri
    if node.text?
      text_stripped = text_content(node).to_s.strip.gsub("\\", "")
      return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
    end
    false
  else
    false
  end
end

.element_node?(node) ⇒ Boolean

True when node is an element node.

Returns:

  • (Boolean)


75
76
77
78
79
80
81
82
83
84
# File 'lib/canon/comparison/node_inspector.rb', line 75

def self.element_node?(node)
  case node
  when Canon::Xml::Node
    node.node_type == :element
  when Nokogiri::XML::Node
    node.element?
  else
    false
  end
end

.parse_errors(node) ⇒ Object

Extract parse-time errors carried on a node or its owning document. Returns an Array of Strings.



88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/canon/comparison/node_inspector.rb', line 88

def self.parse_errors(node)
  case node
  when nil
    []
  when Canon::Xml::Node
    errors = node.parse_errors
    Array(errors).map(&:to_s)
  when Nokogiri::XML::Document, Nokogiri::HTML5::Document
    Array(node.errors).map(&:to_s)
  else
    []
  end
end

.text_content(node) ⇒ Object

Extract the text content of node as a String.



32
33
34
35
36
37
38
39
40
41
# File 'lib/canon/comparison/node_inspector.rb', line 32

def self.text_content(node)
  case node
  when Canon::Xml::Node
    node.value.to_s
  when Nokogiri::XML::Node
    node.content.to_s
  else
    node.to_s
  end
end

.text_node?(node) ⇒ Boolean

True when node is a text node (whitespace, content, etc.).

Returns:

  • (Boolean)


20
21
22
23
24
25
26
27
28
29
# File 'lib/canon/comparison/node_inspector.rb', line 20

def self.text_node?(node)
  case node
  when Canon::Xml::Node
    node.node_type == CANON_TEXT_TYPE
  when Nokogiri::XML::Node
    node.node_type == NOKOGIRI_TEXT_TYPE
  else
    false
  end
end

.whitespace_only_text?(node) ⇒ Boolean

True when node is a text node whose content is whitespace-only. Empty-string text nodes return false — those represent genuine empty-vs-content asymmetry, not pretty-print indentation.

Returns:

  • (Boolean)


46
47
48
49
50
51
# File 'lib/canon/comparison/node_inspector.rb', line 46

def self.whitespace_only_text?(node)
  return false unless text_node?(node)

  text = text_content(node)
  !text.empty? && text.strip.empty?
end