Module: Canon::Comparison::NodeInspector

Defined in:: lib/canon/comparison/node_inspector.rb

Overview

Single source of truth for cross-backend node type operations.

The comparison pipeline handles nodes from multiple sources:

Canon::Xml::Node (+ RootNode, ElementNode, TextNode, etc.) —custom DOM built by SAX builder and DataModel.
Canon::TreeDiff::Core::TreeNode — semantic tree diff nodes.
Backend-specific nodes (Nokogiri or Moxml) — live parsed nodes.

All type dispatch uses backend-branching (‘if XmlBackend.nokogiri?`) rather than `case/when` with constant references. This prevents NameError when Nokogiri constants are undefined under Opal.

Every node query in the codebase should go through this module. Do not create private dispatch methods in consumers.

Constant Summary collapse

NOKOGIRI_TEXT_TYPE =

defined?(Nokogiri::XML::Node::TEXT_NODE) ? Nokogiri::XML::Node::TEXT_NODE : 3

Class Method Summary collapse

.attribute_value(node, attr_name) ⇒ Object

Unified attribute value access.
.children(node) ⇒ Object

Unified children access across all node types.
.comment_node?(node) ⇒ Boolean
.document?(node) ⇒ Boolean
.document_fragment?(node) ⇒ Boolean
.element_node?(node) ⇒ Boolean
.name(node) ⇒ Object

Unified node name extraction across all node types.
.namespace_uri(node) ⇒ Object

Unified namespace URI access.
.node_type(node) ⇒ Object

Unified node type that always returns a symbol.
.noise_dimension_for(node) ⇒ Object

— Noise classification —.
.noise_node?(node) ⇒ Boolean
.parent(node) ⇒ Object

Unified parent access across all node types.
.parent_of(node) ⇒ Object

Deprecated: use NodeInspector.parent instead.
.parse_errors(node) ⇒ Object

Extract parse-time errors carried on a node or its owning document.
.text_content(node) ⇒ Object

Extract the text content of node as a String.
.text_node?(node) ⇒ Boolean

— Type predicates —.
.whitespace_only_text?(node) ⇒ Boolean

True when node is a text node whose content is whitespace-only.

Class Method Details

.attribute_value(node, attr_name) ⇒ `Object`

Unified attribute value access.

# File 'lib/canon/comparison/node_inspector.rb', line 158

def self.attribute_value(node, attr_name)
  return nil unless node

  if node.is_a?(Canon::Xml::Nodes::ElementNode)
    attr = node.attribute_nodes.find { |a| a.name == attr_name.to_s }
    attr&.value
  elsif node.is_a?(Canon::Xml::Node)
    nil
  else
    XmlParsing.attribute_value(node, attr_name)
  end
end

.children(node) ⇒ `Object`

Unified children access across all node types.

# File 'lib/canon/comparison/node_inspector.rb', line 122

def self.children(node)
  return [] unless node
  return node.children if node.is_a?(Canon::Xml::Node)
  return node.children || [] if node.is_a?(Canon::TreeDiff::Core::TreeNode)

  XmlParsing.children(node)
end

.comment_node?(node) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/canon/comparison/node_inspector.rb', line 46

def self.comment_node?(node)
  return false unless node
  return node.node_type == :comment if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    return true if node.is_a?(Nokogiri::XML::Node) && node.comment?

    # HTML comments are parsed as TEXT nodes by Nokogiri
    if node.is_a?(Nokogiri::XML::Node) && node.text?
      text_stripped = text_content(node).to_s.strip.gsub("\\", "")
      return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
    end
    false
  else
    node.is_a?(Moxml::Comment)
  end
end

.document?(node) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/canon/comparison/node_inspector.rb', line 64

def self.document?(node)
  return node.node_type == :root if node.is_a?(Canon::Xml::Node)

  XmlParsing.document?(node)
end

.document_fragment?(node) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/canon/comparison/node_inspector.rb', line 70

def self.document_fragment?(node)
  return false unless node
  return false unless node.is_a?(Canon::Xml::Nodes::RootNode)

  node.fragment?
end

.element_node?(node) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/canon/comparison/node_inspector.rb', line 35

def self.element_node?(node)
  return false unless node
  return node.node_type == :element if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    node.is_a?(Nokogiri::XML::Element) || node.is_a?(Moxml::Element)
  else
    node.is_a?(Moxml::Element)
  end
end

.name(node) ⇒ `Object`

Unified node name extraction across all node types.

# File 'lib/canon/comparison/node_inspector.rb', line 104

def self.name(node)
  return nil unless node
  return node.name if node.is_a?(Canon::Xml::Node)
  return node.label if node.is_a?(Canon::TreeDiff::Core::TreeNode)

  XmlParsing.name(node)
end

.namespace_uri(node) ⇒ `Object`

Unified namespace URI access.

# File 'lib/canon/comparison/node_inspector.rb', line 172

def self.namespace_uri(node)
  return nil unless node

  if node.is_a?(Canon::Xml::Node)
    node.is_a?(Canon::Xml::Nodes::ElementNode) ? node.namespace_uri : nil
  else
    XmlParsing.namespace_uri(node)
  end
end

.node_type(node) ⇒ `Object`

Unified node type that always returns a symbol. Returns nil for unrecognised nodes.

# File 'lib/canon/comparison/node_inspector.rb', line 146

def self.node_type(node)
  return nil unless node
  return node.node_type if node.is_a?(Canon::Xml::Node)

  if node.is_a?(Canon::TreeDiff::Core::TreeNode)
    node.type&.to_sym
  else
    XmlParsing.node_type(node)
  end
end

.noise_dimension_for(node) ⇒ `Object`

— Noise classification —

# File 'lib/canon/comparison/node_inspector.rb', line 89

def self.noise_dimension_for(node)
  if whitespace_only_text?(node)
    :whitespace_adjacency
  elsif comment_node?(node)
    :comments
  end
end

.noise_node?(node) ⇒ `Boolean`

Returns:

(Boolean)



97
98
99

# File 'lib/canon/comparison/node_inspector.rb', line 97

def self.noise_node?(node)
  !noise_dimension_for(node).nil?
end

.parent(node) ⇒ `Object`

Unified parent access across all node types.

# File 'lib/canon/comparison/node_inspector.rb', line 113

def self.parent(node)
  return nil unless node
  return node.parent if node.is_a?(Canon::Xml::Node)
  return node.parent if node.is_a?(Canon::TreeDiff::Core::TreeNode)

  XmlParsing.parent(node)
end

.parent_of(node) ⇒ `Object`

Deprecated: use NodeInspector.parent instead.



199
200
201

# File 'lib/canon/comparison/node_inspector.rb', line 199

def self.parent_of(node)
  parent(node)
end

.parse_errors(node) ⇒ `Object`

Extract parse-time errors carried on a node or its owning document.

# File 'lib/canon/comparison/node_inspector.rb', line 183

def self.parse_errors(node)
  return [] if node.nil?
  return Array(node.parse_errors).map(&:to_s) if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    if node.is_a?(Nokogiri::XML::Document) || node.is_a?(Nokogiri::HTML5::Document)
      Array(node.errors).map(&:to_s)
    else
      []
    end
  else
    []
  end
end

.text_content(node) ⇒ `Object`

Extract the text content of node as a String.

# File 'lib/canon/comparison/node_inspector.rb', line 131

def self.text_content(node)
  case node
  when Canon::Xml::Nodes::TextNode
    node.value.to_s
  when Canon::Xml::Node
    node.text_content.to_s
  when Moxml::Text
    node.content.to_s
  else
    XmlParsing.text_content(node).to_s
  end
end

.text_node?(node) ⇒ `Boolean`

— Type predicates —

Returns:

(Boolean)

# File 'lib/canon/comparison/node_inspector.rb', line 24

def self.text_node?(node)
  return false unless node
  return node.node_type == :text if node.is_a?(Canon::Xml::Node)

  if XmlBackend.nokogiri?
    node.is_a?(Nokogiri::XML::Text) || node.is_a?(Moxml::Text)
  else
    node.is_a?(Moxml::Text)
  end
end

.whitespace_only_text?(node) ⇒ `Boolean`

True when node is a text node whose content is whitespace-only. Empty-string text nodes return false — those represent genuine empty-vs-content asymmetry, not pretty-print indentation.

Returns:

(Boolean)

# File 'lib/canon/comparison/node_inspector.rb', line 80

def self.whitespace_only_text?(node)
  return false unless text_node?(node)

  text = text_content(node)
  !text.empty? && text.strip.empty?
end

Module: Canon::Comparison::NodeInspector

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.attribute_value(node, attr_name) ⇒ Object

.children(node) ⇒ Object

.comment_node?(node) ⇒ Boolean

.document?(node) ⇒ Boolean

.document_fragment?(node) ⇒ Boolean

.element_node?(node) ⇒ Boolean

.name(node) ⇒ Object

.namespace_uri(node) ⇒ Object

.node_type(node) ⇒ Object

.noise_dimension_for(node) ⇒ Object

.noise_node?(node) ⇒ Boolean

.parent(node) ⇒ Object

.parent_of(node) ⇒ Object

.parse_errors(node) ⇒ Object

.text_content(node) ⇒ Object

.text_node?(node) ⇒ Boolean

.whitespace_only_text?(node) ⇒ Boolean

.attribute_value(node, attr_name) ⇒ `Object`

.children(node) ⇒ `Object`

.comment_node?(node) ⇒ `Boolean`

.document?(node) ⇒ `Boolean`

.document_fragment?(node) ⇒ `Boolean`

.element_node?(node) ⇒ `Boolean`

.name(node) ⇒ `Object`

.namespace_uri(node) ⇒ `Object`

.node_type(node) ⇒ `Object`

.noise_dimension_for(node) ⇒ `Object`

.noise_node?(node) ⇒ `Boolean`

.parent(node) ⇒ `Object`

.parent_of(node) ⇒ `Object`

.parse_errors(node) ⇒ `Object`

.text_content(node) ⇒ `Object`

.text_node?(node) ⇒ `Boolean`

.whitespace_only_text?(node) ⇒ `Boolean`