Module: Canon::Comparison::NodeInspector

Defined in:
lib/canon/comparison/node_inspector.rb

Overview

Single source of truth for cross-backend node type operations.

The comparison pipeline handles nodes from multiple sources:

  • Canon::Xml::Node (+ RootNode, ElementNode, TextNode, etc.) —custom DOM built by SAX builder and DataModel.

  • Canon::TreeDiff::Core::TreeNode — semantic tree diff nodes.

  • Backend-specific nodes (Nokogiri or Moxml) — live parsed nodes.

All type dispatch uses backend-branching (‘if XmlBackend.nokogiri?`) rather than `case/when` with constant references. This prevents NameError when Nokogiri constants are undefined under Opal.

Every node query in the codebase should go through this module. Do not create private dispatch methods in consumers.

Constant Summary collapse

NOKOGIRI_TEXT_TYPE =
defined?(Nokogiri::XML::Node::TEXT_NODE) ? Nokogiri::XML::Node::TEXT_NODE : 3

Class Method Summary collapse

Class Method Details

.attribute_value(node, attr_name) ⇒ Object

Unified attribute value access.



158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/canon/comparison/node_inspector.rb', line 158

def self.attribute_value(node, attr_name)
  return nil unless node

  if node.is_a?(Canon::Xml::Nodes::ElementNode)
    attr = node.attribute_nodes.find { |a| a.name == attr_name.to_s }
    attr&.value
  elsif node.is_a?(Canon::Xml::Node)
    nil
  else
    XmlParsing.attribute_value(node, attr_name)
  end
end

.children(node) ⇒ Object

Unified children access across all node types.



122
123
124
125
126
127
128
# File 'lib/canon/comparison/node_inspector.rb', line 122

def self.children(node)
  return [] unless node
  return node.children if node.is_a?(Canon::Xml::Node)
  return node.children || [] if node.is_a?(Canon::TreeDiff::Core::TreeNode)

  XmlParsing.children(node)
end

.comment_node?(node) ⇒ Boolean

Returns:

  • (Boolean)


46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/canon/comparison/node_inspector.rb', line 46

def self.comment_node?(node)
  return false unless node
  return node.node_type == :comment if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    return true if node.is_a?(Nokogiri::XML::Node) && node.comment?

    # HTML comments are parsed as TEXT nodes by Nokogiri
    if node.is_a?(Nokogiri::XML::Node) && node.text?
      text_stripped = text_content(node).to_s.strip.gsub("\\", "")
      return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
    end
    false
  else
    node.is_a?(Moxml::Comment)
  end
end

.document?(node) ⇒ Boolean

Returns:

  • (Boolean)


64
65
66
67
68
# File 'lib/canon/comparison/node_inspector.rb', line 64

def self.document?(node)
  return node.node_type == :root if node.is_a?(Canon::Xml::Node)

  XmlParsing.document?(node)
end

.document_fragment?(node) ⇒ Boolean

Returns:

  • (Boolean)


70
71
72
73
74
75
# File 'lib/canon/comparison/node_inspector.rb', line 70

def self.document_fragment?(node)
  return false unless node
  return false unless node.is_a?(Canon::Xml::Nodes::RootNode)

  node.fragment?
end

.element_node?(node) ⇒ Boolean

Returns:

  • (Boolean)


35
36
37
38
39
40
41
42
43
44
# File 'lib/canon/comparison/node_inspector.rb', line 35

def self.element_node?(node)
  return false unless node
  return node.node_type == :element if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    node.is_a?(Nokogiri::XML::Element) || node.is_a?(Moxml::Element)
  else
    node.is_a?(Moxml::Element)
  end
end

.name(node) ⇒ Object

Unified node name extraction across all node types.



104
105
106
107
108
109
110
# File 'lib/canon/comparison/node_inspector.rb', line 104

def self.name(node)
  return nil unless node
  return node.name if node.is_a?(Canon::Xml::Node)
  return node.label if node.is_a?(Canon::TreeDiff::Core::TreeNode)

  XmlParsing.name(node)
end

.namespace_uri(node) ⇒ Object

Unified namespace URI access.



172
173
174
175
176
177
178
179
180
# File 'lib/canon/comparison/node_inspector.rb', line 172

def self.namespace_uri(node)
  return nil unless node

  if node.is_a?(Canon::Xml::Node)
    node.is_a?(Canon::Xml::Nodes::ElementNode) ? node.namespace_uri : nil
  else
    XmlParsing.namespace_uri(node)
  end
end

.node_type(node) ⇒ Object

Unified node type that always returns a symbol. Returns nil for unrecognised nodes.



146
147
148
149
150
151
152
153
154
155
# File 'lib/canon/comparison/node_inspector.rb', line 146

def self.node_type(node)
  return nil unless node
  return node.node_type if node.is_a?(Canon::Xml::Node)

  if node.is_a?(Canon::TreeDiff::Core::TreeNode)
    node.type&.to_sym
  else
    XmlParsing.node_type(node)
  end
end

.noise_dimension_for(node) ⇒ Object

— Noise classification —



89
90
91
92
93
94
95
# File 'lib/canon/comparison/node_inspector.rb', line 89

def self.noise_dimension_for(node)
  if whitespace_only_text?(node)
    :whitespace_adjacency
  elsif comment_node?(node)
    :comments
  end
end

.noise_node?(node) ⇒ Boolean

Returns:

  • (Boolean)


97
98
99
# File 'lib/canon/comparison/node_inspector.rb', line 97

def self.noise_node?(node)
  !noise_dimension_for(node).nil?
end

.parent(node) ⇒ Object

Unified parent access across all node types.



113
114
115
116
117
118
119
# File 'lib/canon/comparison/node_inspector.rb', line 113

def self.parent(node)
  return nil unless node
  return node.parent if node.is_a?(Canon::Xml::Node)
  return node.parent if node.is_a?(Canon::TreeDiff::Core::TreeNode)

  XmlParsing.parent(node)
end

.parent_of(node) ⇒ Object

Deprecated: use NodeInspector.parent instead.



199
200
201
# File 'lib/canon/comparison/node_inspector.rb', line 199

def self.parent_of(node)
  parent(node)
end

.parse_errors(node) ⇒ Object

Extract parse-time errors carried on a node or its owning document.



183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'lib/canon/comparison/node_inspector.rb', line 183

def self.parse_errors(node)
  return [] if node.nil?
  return Array(node.parse_errors).map(&:to_s) if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    if node.is_a?(Nokogiri::XML::Document) || node.is_a?(Nokogiri::HTML5::Document)
      Array(node.errors).map(&:to_s)
    else
      []
    end
  else
    []
  end
end

.text_content(node) ⇒ Object

Extract the text content of node as a String.



131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/canon/comparison/node_inspector.rb', line 131

def self.text_content(node)
  case node
  when Canon::Xml::Nodes::TextNode
    node.value.to_s
  when Canon::Xml::Node
    node.text_content.to_s
  when Moxml::Text
    node.content.to_s
  else
    XmlParsing.text_content(node).to_s
  end
end

.text_node?(node) ⇒ Boolean

— Type predicates —

Returns:

  • (Boolean)


24
25
26
27
28
29
30
31
32
33
# File 'lib/canon/comparison/node_inspector.rb', line 24

def self.text_node?(node)
  return false unless node
  return node.node_type == :text if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    node.is_a?(Nokogiri::XML::Text) || node.is_a?(Moxml::Text)
  else
    node.is_a?(Moxml::Text)
  end
end

.whitespace_only_text?(node) ⇒ Boolean

True when node is a text node whose content is whitespace-only. Empty-string text nodes return false — those represent genuine empty-vs-content asymmetry, not pretty-print indentation.

Returns:

  • (Boolean)


80
81
82
83
84
85
# File 'lib/canon/comparison/node_inspector.rb', line 80

def self.whitespace_only_text?(node)
  return false unless text_node?(node)

  text = text_content(node)
  !text.empty? && text.strip.empty?
end