Module: Canon::Comparison::XmlNodeComparison

Defined in:: lib/canon/comparison/xml_node_comparison.rb

Overview

XML Node Comparison Utilities

Provides public comparison methods for XML/HTML nodes. This module extracts shared comparison logic that was previously accessed via send() from HtmlComparator.

This is a simple utility module with focused responsibilities.

Class Method Summary collapse

.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object

Add a difference to the differences array.
.comment_node?(node, check_children: false) ⇒ Boolean

Check if a node is a comment node.
.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean

Check if this is a comment vs non-comment comparison.
.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

Compare document fragments by comparing their children.
.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

Main comparison dispatcher for XML nodes.
.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

Dispatch comparison based on node type.
.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object

Dispatch by Canon::Xml::Node type.
.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object

Dispatch by legacy Nokogiri/Moxml node type.
.filter_children(children, opts) ⇒ Array

Filter children based on options.
.node_excluded?(node, opts) ⇒ Boolean

Check if a node should be excluded from comparison.
.node_text(node) ⇒ String

Extract text content from a node.
.opts_for_side(opts, side) ⇒ Hash

Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.
.same_node_type?(node1, node2) ⇒ Boolean

Check if two nodes are of the same type.
.serialize_node_to_xml(node) ⇒ String

Serialize a Canon::Xml::Node to XML string.
.text_node?(node) ⇒ Boolean

Check if a node is a text node.

Class Method Details

.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ `Object`

Add a difference to the differences array

Parameters:

node1 (Object) —

First node
node2 (Object) —

Second node
diff1 (Symbol) —

Difference type for node1
diff2 (Symbol) —

Difference type for node2
dimension (Symbol) —

The dimension of the difference
opts (Hash) —

Comparison options
differences (Array) —

Array to append difference to

# File 'lib/canon/comparison/xml_node_comparison.rb', line 417

def self.add_difference(node1, node2, diff1, diff2, dimension, opts,
differences)
  return unless opts[:verbose]

  require_relative "xml_comparator"
  XmlComparator.add_difference(node1, node2, diff1, diff2, dimension,
                               opts, differences)
end

.comment_node?(node, check_children: false) ⇒ `Boolean`

Check if a node is a comment node

For XML/XHTML, this checks the node’s comment? method or node_type. For HTML, this also checks TEXT nodes that contain HTML-style comments (Nokogiri parses HTML comments as TEXT nodes with content like “<!– comment –>” or escaped like “<\!– comment –>” in full HTML documents).

Parameters:

node (Object) —

Node to check
check_children (Boolean) (defaults to: false) —

Whether to check child nodes

Returns:

(Boolean) —

true if node is a comment

# File 'lib/canon/comparison/xml_node_comparison.rb', line 296

def self.comment_node?(node, check_children: false)
  result = false
  return true if node.respond_to?(:comment?) && node.comment?
  return true if node.respond_to?(:node_type) && node.node_type == :comment

  if node.is_a?(Nokogiri::XML::Element) && !node.children.empty? && check_children
    node.children.each do |child|
      # Recursively check child nodes for comments
      # limit depth to avoid infinite recursion
      # in case of circular structures (if any)
      if comment_node?(child, check_children: false)
        result = true
        break
      end
    end
  end
  return true if result

  # HTML comments are parsed as TEXT nodes by Nokogiri
  # Check if this is a text node with HTML comment content
  if text_node?(node)
    text = node_text(node)
    # Strip whitespace and backslashes for comparison
    # Nokogiri escapes HTML comments as "<\\!-- comment -->" in full documents
    text_stripped = text.to_s.strip.gsub("\\", "")
    return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
  end

  result
end

.comment_vs_non_comment_comparison?(node1, node2) ⇒ `Boolean`

Check if this is a comment vs non-comment comparison

This handles the case where zip pairs a comment with a non-comment node due to different lengths in the children arrays. We create a :comments dimension difference instead of UNEQUAL_NODES_TYPES.

Parameters:

node1 (Object) —

First node
node2 (Object) —

Second node

Returns:

(Boolean) —

true if one node is a comment and the other isn’t

# File 'lib/canon/comparison/xml_node_comparison.rb', line 262

def self.comment_vs_non_comment_comparison?(node1, node2)
  node1_comment = comment_node?(node1, check_children: true)
  node2_comment = comment_node?(node2, check_children: true)

  # XOR: exactly one is a comment
  node1_comment ^ node2_comment
end

.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Symbol`

Compare document fragments by comparing their children

Parameters:

node1 (Nokogiri::XML::DocumentFragment) —

First fragment
node2 (Nokogiri::XML::DocumentFragment) —

Second fragment
opts (Hash) —

Comparison options
child_opts (Hash) —

Options for child comparison
diff_children (Boolean) —

Whether to diff children
differences (Array) —

Array to append differences to

Returns:

(Symbol) —

Comparison result constant

# File 'lib/canon/comparison/xml_node_comparison.rb', line 135

def self.compare_document_fragments(node1, node2, opts, child_opts,
                                    diff_children, differences)
  childrenode1 = node1.children.to_a
  childrenode2 = node2.children.to_a

  # Filter children before comparison to handle ignored nodes (like comments with :ignore).
  # Apply side-specific pretty-print heuristic when the relevant flag is active.
  children1 = filter_children(childrenode1,
                              opts_for_side(opts, :expected))
  children2 = filter_children(childrenode2,
                              opts_for_side(opts, :received))

  if children1.length != children2.length
    add_difference(node1, node2, Comparison::UNEQUAL_ELEMENTS,
                   Comparison::UNEQUAL_ELEMENTS, :text_content, opts,
                   differences)
    # Continue comparing children to find deeper differences like attribute values
    # Use zip to compare up to the shorter length
  end

  if children1.empty? && children2.empty?
    Comparison::EQUIVALENT
  else
    # Compare each pair of children (up to the shorter length)
    result = Comparison::EQUIVALENT
    children1.zip(children2).each do |child1, child2|
      # Skip if one is nil (due to different lengths)
      next if child1.nil? || child2.nil?

      child_result = compare_nodes(child1, child2, opts, child_opts,
                                   diff_children, differences)
      result = child_result unless result == Comparison::EQUIVALENT
    end
    result
  end
end

.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Symbol`

Main comparison dispatcher for XML nodes

This method handles the high-level comparison logic, delegating to specific comparison methods based on node types.

Parameters:

node1 (Object) —

First node
node2 (Object) —

Second node
opts (Hash) —

Comparison options
child_opts (Hash) —

Options for child comparison
diff_children (Boolean) —

Whether to diff children
differences (Array) —

Array to append differences to

Returns:

(Symbol) —

Comparison result constant

# File 'lib/canon/comparison/xml_node_comparison.rb', line 25

def self.compare_nodes(node1, node2, opts, child_opts, diff_children,
differences)
  # Handle DocumentFragment nodes - compare their children instead
  if node1.is_a?(Nokogiri::XML::DocumentFragment) &&
      node2.is_a?(Nokogiri::XML::DocumentFragment)
    return compare_document_fragments(node1, node2, opts, child_opts,
                                      diff_children, differences)
  end

  # Check if nodes should be excluded
  return Comparison::EQUIVALENT if node_excluded?(node1, opts) &&
    node_excluded?(node2, opts)

  if node_excluded?(node1, opts) || node_excluded?(node2, opts)
    add_difference(node1, node2, Comparison::MISSING_NODE,
                   Comparison::MISSING_NODE, :text_content, opts,
                   differences)
    return Comparison::MISSING_NODE
  end

  # Handle comment vs non-comment comparisons specially
  # When comparing a comment with a non-comment node (due to zip pairing),
  # create a :comments dimension difference instead of UNEQUAL_NODES_TYPES
  if comment_vs_non_comment_comparison?(node1, node2)
    match_opts = opts[:match_opts]
    comment_behavior = match_opts ? match_opts[:comments] : nil

    # Create a :comments dimension difference
    # The difference will be marked as normative or not based on the HtmlCompareProfile
    add_difference(node1, node2, Comparison::MISSING_NODE,
                   Comparison::MISSING_NODE, :comments, opts,
                   differences)

    # Return EQUIVALENT if comments are ignored, otherwise return UNEQUAL
    if comment_behavior == :ignore
      Comparison::EQUIVALENT
    else
      Comparison::UNEQUAL_COMMENTS
    end
  end

  # Check node types match
  unless same_node_type?(node1, node2)
    add_difference(node1, node2, Comparison::UNEQUAL_NODES_TYPES,
                   Comparison::UNEQUAL_NODES_TYPES, :text_content, opts,
                   differences)
    return Comparison::UNEQUAL_NODES_TYPES
  end

  # Dispatch based on node type
  dispatch_by_node_type(node1, node2, opts, child_opts, diff_children,
                        differences)
end

.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Symbol`

Dispatch comparison based on node type

Parameters:

node1 (Object) —

First node
node2 (Object) —

Second node
opts (Hash) —

Comparison options
child_opts (Hash) —

Options for child comparison
diff_children (Boolean) —

Whether to diff children
differences (Array) —

Array to append differences to

Returns:

(Symbol) —

Comparison result constant

# File 'lib/canon/comparison/xml_node_comparison.rb', line 181

def self.dispatch_by_node_type(node1, node2, opts, child_opts,
diff_children, differences)
  # Canon::Xml::Node types use .node_type method that returns symbols
  # Nokogiri also has .node_type but returns integers, so check for Symbol
  if node1.respond_to?(:node_type) && node2.respond_to?(:node_type) &&
      node1.node_type.is_a?(Symbol) && node2.node_type.is_a?(Symbol)
    dispatch_canon_node_type(node1, node2, opts, child_opts,
                             diff_children, differences)
  # Moxml/Nokogiri types use .element?, .text?, etc. methods
  else
    dispatch_legacy_node_type(node1, node2, opts, child_opts,
                              diff_children, differences)
  end
end

.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Object`

Dispatch by Canon::Xml::Node type

# File 'lib/canon/comparison/xml_node_comparison.rb', line 356

def self.dispatch_canon_node_type(node1, node2, opts, child_opts,
diff_children, differences)
  # Import XmlComparator to use its comparison methods
  require_relative "xml_comparator"

  case node1.node_type
  when :root
    XmlComparator.compare_children(node1, node2, opts, child_opts,
                                   diff_children, differences)
  when :element
    XmlComparator.compare_element_nodes(node1, node2, opts, child_opts,
                                        diff_children, differences)
  when :text
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  when :comment
    XmlComparator.compare_comment_nodes(node1, node2, opts, differences)
  when :cdata
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  when :processing_instruction
    XmlComparator.compare_processing_instruction_nodes(node1, node2,
                                                       opts, differences)
  else
    Comparison::EQUIVALENT
  end
end

.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Object`

Dispatch by legacy Nokogiri/Moxml node type

# File 'lib/canon/comparison/xml_node_comparison.rb', line 383

def self.dispatch_legacy_node_type(node1, node2, opts, child_opts,
diff_children, differences)
  # Import XmlComparator to use its comparison methods
  require_relative "xml_comparator"

  if node1.respond_to?(:element?) && node1.element?
    XmlComparator.compare_element_nodes(node1, node2, opts, child_opts,
                                        diff_children, differences)
  elsif node1.respond_to?(:text?) && node1.text?
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  elsif node1.respond_to?(:comment?) && node1.comment?
    XmlComparator.compare_comment_nodes(node1, node2, opts, differences)
  elsif node1.respond_to?(:cdata?) && node1.cdata?
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  elsif node1.respond_to?(:processing_instruction?) && node1.processing_instruction?
    XmlComparator.compare_processing_instruction_nodes(node1, node2,
                                                       opts, differences)
  elsif node1.respond_to?(:root)
    XmlComparator.compare_document_nodes(node1, node2, opts, child_opts,
                                         diff_children, differences)
  else
    Comparison::EQUIVALENT
  end
end

.filter_children(children, opts) ⇒ `Array`

Filter children based on options

Removes nodes that should be excluded from comparison based on options like :ignore_nodes, :ignore_comments, etc.

Parameters:

children (Array) —

Array of child nodes
opts (Hash) —

Comparison options

Returns:

(Array) —

Filtered array of children

# File 'lib/canon/comparison/xml_node_comparison.rb', line 87

def self.filter_children(children, opts)
  children.reject do |child|
    node_excluded?(child, opts)
  end
end

.node_excluded?(node, opts) ⇒ `Boolean`

Check if a node should be excluded from comparison

Parameters:

node (Object) —

Node to check
opts (Hash) —

Comparison options

Returns:

(Boolean) —

true if node should be excluded

# File 'lib/canon/comparison/xml_node_comparison.rb', line 203

def self.node_excluded?(node, opts)
  return false if node.nil?

  return true if opts[:ignore_nodes]&.include?(node)
  return true if opts[:ignore_comments] && comment_node?(node)
  return true if opts[:ignore_text_nodes] && text_node?(node)

  # Check match options
  match_opts = opts[:match_opts]
  return false unless match_opts

  # Filter comments based on match options and format
  # HTML: Filter comments to avoid spurious differences from zip pairing
  #       BUT only when not in verbose mode (verbose needs differences recorded)
  # XML: Don't filter comments (allow informative differences to be recorded)
  if match_opts[:comments] == :ignore && comment_node?(node)
    # In verbose mode, don't filter comments - we want to record the differences
    return false if opts[:verbose]

    # Only filter comments for HTML, not XML (when not verbose)
    format = opts[:format] || match_opts[:format]
    if %i[html html4 html5].include?(format)
      return true
    end
  end

  # Strip whitespace-only text nodes based on parent element configuration.
  # Use preserve_whitespace_elements / strip_whitespace_elements to control.
  # Blacklist (strip) > preserve > collapse > format defaults.
  return false unless text_node?(node) && node.parent
  return false unless MatchOptions.normalize_text(node_text(node)).empty?

  return true unless WhitespaceSensitivity.whitespace_preserved?(
    node.parent, match_opts
  )

  # When the pretty-print-side flag is active (set by opts_for_side in
  # ChildComparison.compare), drop whitespace-only text nodes that start
  # with "\n" inside :collapse elements — they are structural indentation
  # from the pretty-printer, not content.  Space-only nodes (no "\n") are
  # real inline content and are kept for normalised comparison.
  # :preserve elements are always left unchanged.
  if match_opts[:_pretty_print_side_active]
    ws_class = WhitespaceSensitivity.classify_text_node(node, opts)
    return true if ws_class == :collapse && node_text(node).start_with?("\n")
  end

  false
end

.node_text(node) ⇒ `String`

Extract text content from a node

Parameters:

node (Object) —

Node to extract text from

Returns:

(String) —

Text content

# File 'lib/canon/comparison/xml_node_comparison.rb', line 341

def self.node_text(node)
  return "" unless node

  if node.respond_to?(:content)
    node.content.to_s
  elsif node.respond_to?(:text)
    node.text.to_s
  elsif node.respond_to?(:value)
    node.value.to_s
  else
    ""
  end
end

.opts_for_side(opts, side) ⇒ `Hash`

Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.

When pretty_printed_expected (side :expected) or pretty_printed_received (side :received) is truthy in match_opts, returns a shallow copy of opts with an ephemeral _pretty_print_side_active: true flag merged into :match_opts. Otherwise returns opts unchanged (no allocation overhead).

The flag is consumed by node_excluded? to drop whitespace-only text nodes that start with “n” in :normalize whitespace elements. It is intentionally NOT propagated to recursive compare_nodes calls —each level of ChildComparison.compare re-evaluates it from the original pretty_printed_* flags.

Parameters:

opts (Hash) —

Full comparison options hash
side (Symbol) —

:expected or :received

Returns:

(Hash) —

opts copy with ephemeral flag, or opts itself

# File 'lib/canon/comparison/xml_node_comparison.rb', line 111

def self.opts_for_side(opts, side)
  match_opts = opts[:match_opts]
  return opts unless match_opts

  active = case side
           when :expected then match_opts[:pretty_printed_expected]
           when :received then match_opts[:pretty_printed_received]
           else false
           end

  return opts unless active

  opts.merge(match_opts: match_opts.merge(_pretty_print_side_active: true))
end

.same_node_type?(node1, node2) ⇒ `Boolean`

Check if two nodes are of the same type

Parameters:

node1 (Object) —

First node
node2 (Object) —

Second node

Returns:

(Boolean) —

true if nodes are same type

# File 'lib/canon/comparison/xml_node_comparison.rb', line 275

def self.same_node_type?(node1, node2)
  return false if node1.class != node2.class

  # For Nokogiri/Canon::Xml nodes, check node type
  if node1.respond_to?(:node_type) && node2.respond_to?(:node_type)
    node1.node_type == node2.node_type
  else
    true
  end
end

.serialize_node_to_xml(node) ⇒ `String`

Serialize a Canon::Xml::Node to XML string

This utility method handles serialization of different node types to their string representation for display and debugging purposes.

Parameters:

node (Canon::Xml::Node, Object) —

Node to serialize

Returns:

(String) —

XML string representation

# File 'lib/canon/comparison/xml_node_comparison.rb', line 433

def self.serialize_node_to_xml(node)
  if node.is_a?(Canon::Xml::Nodes::RootNode)
    # Serialize all children of root
    node.children.map { |child| serialize_node_to_xml(child) }.join
  elsif node.is_a?(Canon::Xml::Nodes::ElementNode)
    # Serialize element with attributes and children
    attrs = node.attribute_nodes.map do |a|
      " #{a.name}=\"#{a.value}\""
    end.join
    children_xml = node.children.map do |c|
      serialize_node_to_xml(c)
    end.join

    if children_xml.empty?
      "<#{node.name}#{attrs}/>"
    else
      "<#{node.name}#{attrs}>#{children_xml}</#{node.name}>"
    end
  elsif node.is_a?(Canon::Xml::Nodes::TextNode)
    node.value
  elsif node.is_a?(Canon::Xml::Nodes::CommentNode)
    "<!--#{node.value}-->"
  elsif node.is_a?(Canon::Xml::Nodes::ProcessingInstructionNode)
    "<?#{node.target} #{node.data}?>"
  elsif node.respond_to?(:to_xml)
    node.to_xml
  else
    node.to_s
  end
end

.text_node?(node) ⇒ `Boolean`

Check if a node is a text node

Parameters:

node (Object) —

Node to check

Returns:

(Boolean) —

true if node is a text node

# File 'lib/canon/comparison/xml_node_comparison.rb', line 331

def self.text_node?(node)
  (node.respond_to?(:text?) && node.text? &&
    !node.respond_to?(:element?)) ||
    (node.respond_to?(:node_type) && node.node_type == :text)
end

Module: Canon::Comparison::XmlNodeComparison

Overview

Class Method Summary collapse

Class Method Details

.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object

.comment_node?(node, check_children: false) ⇒ Boolean

.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean

.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object

.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object

.filter_children(children, opts) ⇒ Array

.node_excluded?(node, opts) ⇒ Boolean

.node_text(node) ⇒ String

.opts_for_side(opts, side) ⇒ Hash

.same_node_type?(node1, node2) ⇒ Boolean

.serialize_node_to_xml(node) ⇒ String

.text_node?(node) ⇒ Boolean

.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ `Object`

.comment_node?(node, check_children: false) ⇒ `Boolean`

.comment_vs_non_comment_comparison?(node1, node2) ⇒ `Boolean`

.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Symbol`

.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Symbol`

.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Symbol`

.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Object`

.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ `Object`

.filter_children(children, opts) ⇒ `Array`

.node_excluded?(node, opts) ⇒ `Boolean`

.node_text(node) ⇒ `String`

.opts_for_side(opts, side) ⇒ `Hash`

.same_node_type?(node1, node2) ⇒ `Boolean`

.serialize_node_to_xml(node) ⇒ `String`

.text_node?(node) ⇒ `Boolean`