Module: Canon::Comparison::XmlNodeComparison

Defined in:
lib/canon/comparison/xml_node_comparison.rb

Overview

XML Node Comparison Utilities

Provides public comparison methods for XML/HTML nodes. This module extracts shared comparison logic that was previously accessed via send() from HtmlComparator.

This is a simple utility module with focused responsibilities.

Class Method Summary collapse

Class Method Details

.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object

Add a difference to the differences array

Parameters:

  • node1 (Object)

    First node

  • node2 (Object)

    Second node

  • diff1 (Symbol)

    Difference type for node1

  • diff2 (Symbol)

    Difference type for node2

  • dimension (Symbol)

    The dimension of the difference

  • opts (Hash)

    Comparison options

  • differences (Array)

    Array to append difference to



417
418
419
420
421
422
423
424
# File 'lib/canon/comparison/xml_node_comparison.rb', line 417

def self.add_difference(node1, node2, diff1, diff2, dimension, opts,
differences)
  return unless opts[:verbose]

  require_relative "xml_comparator"
  XmlComparator.add_difference(node1, node2, diff1, diff2, dimension,
                               opts, differences)
end

.comment_node?(node, check_children: false) ⇒ Boolean

Check if a node is a comment node

For XML/XHTML, this checks the node’s comment? method or node_type. For HTML, this also checks TEXT nodes that contain HTML-style comments (Nokogiri parses HTML comments as TEXT nodes with content like “<!– comment –>” or escaped like “<\!– comment –>” in full HTML documents).

Parameters:

  • node (Object)

    Node to check

  • check_children (Boolean) (defaults to: false)

    Whether to check child nodes

Returns:

  • (Boolean)

    true if node is a comment



296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/canon/comparison/xml_node_comparison.rb', line 296

def self.comment_node?(node, check_children: false)
  result = false
  return true if node.respond_to?(:comment?) && node.comment?
  return true if node.respond_to?(:node_type) && node.node_type == :comment

  if node.is_a?(Nokogiri::XML::Element) && !node.children.empty? && check_children
    node.children.each do |child|
      # Recursively check child nodes for comments
      # limit depth to avoid infinite recursion
      # in case of circular structures (if any)
      if comment_node?(child, check_children: false)
        result = true
        break
      end
    end
  end
  return true if result

  # HTML comments are parsed as TEXT nodes by Nokogiri
  # Check if this is a text node with HTML comment content
  if text_node?(node)
    text = node_text(node)
    # Strip whitespace and backslashes for comparison
    # Nokogiri escapes HTML comments as "<\\!-- comment -->" in full documents
    text_stripped = text.to_s.strip.gsub("\\", "")
    return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
  end

  result
end

.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean

Check if this is a comment vs non-comment comparison

This handles the case where zip pairs a comment with a non-comment node due to different lengths in the children arrays. We create a :comments dimension difference instead of UNEQUAL_NODES_TYPES.

Parameters:

  • node1 (Object)

    First node

  • node2 (Object)

    Second node

Returns:

  • (Boolean)

    true if one node is a comment and the other isn’t



262
263
264
265
266
267
268
# File 'lib/canon/comparison/xml_node_comparison.rb', line 262

def self.comment_vs_non_comment_comparison?(node1, node2)
  node1_comment = comment_node?(node1, check_children: true)
  node2_comment = comment_node?(node2, check_children: true)

  # XOR: exactly one is a comment
  node1_comment ^ node2_comment
end

.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

Compare document fragments by comparing their children

Parameters:

  • node1 (Nokogiri::XML::DocumentFragment)

    First fragment

  • node2 (Nokogiri::XML::DocumentFragment)

    Second fragment

  • opts (Hash)

    Comparison options

  • child_opts (Hash)

    Options for child comparison

  • diff_children (Boolean)

    Whether to diff children

  • differences (Array)

    Array to append differences to

Returns:

  • (Symbol)

    Comparison result constant



135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
# File 'lib/canon/comparison/xml_node_comparison.rb', line 135

def self.compare_document_fragments(node1, node2, opts, child_opts,
                                    diff_children, differences)
  childrenode1 = node1.children.to_a
  childrenode2 = node2.children.to_a

  # Filter children before comparison to handle ignored nodes (like comments with :ignore).
  # Apply side-specific pretty-print heuristic when the relevant flag is active.
  children1 = filter_children(childrenode1,
                              opts_for_side(opts, :expected))
  children2 = filter_children(childrenode2,
                              opts_for_side(opts, :received))

  if children1.length != children2.length
    add_difference(node1, node2, Comparison::UNEQUAL_ELEMENTS,
                   Comparison::UNEQUAL_ELEMENTS, :text_content, opts,
                   differences)
    # Continue comparing children to find deeper differences like attribute values
    # Use zip to compare up to the shorter length
  end

  if children1.empty? && children2.empty?
    Comparison::EQUIVALENT
  else
    # Compare each pair of children (up to the shorter length)
    result = Comparison::EQUIVALENT
    children1.zip(children2).each do |child1, child2|
      # Skip if one is nil (due to different lengths)
      next if child1.nil? || child2.nil?

      child_result = compare_nodes(child1, child2, opts, child_opts,
                                   diff_children, differences)
      result = child_result unless result == Comparison::EQUIVALENT
    end
    result
  end
end

.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

Main comparison dispatcher for XML nodes

This method handles the high-level comparison logic, delegating to specific comparison methods based on node types.

Parameters:

  • node1 (Object)

    First node

  • node2 (Object)

    Second node

  • opts (Hash)

    Comparison options

  • child_opts (Hash)

    Options for child comparison

  • diff_children (Boolean)

    Whether to diff children

  • differences (Array)

    Array to append differences to

Returns:

  • (Symbol)

    Comparison result constant



25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'lib/canon/comparison/xml_node_comparison.rb', line 25

def self.compare_nodes(node1, node2, opts, child_opts, diff_children,
differences)
  # Handle DocumentFragment nodes - compare their children instead
  if node1.is_a?(Nokogiri::XML::DocumentFragment) &&
      node2.is_a?(Nokogiri::XML::DocumentFragment)
    return compare_document_fragments(node1, node2, opts, child_opts,
                                      diff_children, differences)
  end

  # Check if nodes should be excluded
  return Comparison::EQUIVALENT if node_excluded?(node1, opts) &&
    node_excluded?(node2, opts)

  if node_excluded?(node1, opts) || node_excluded?(node2, opts)
    add_difference(node1, node2, Comparison::MISSING_NODE,
                   Comparison::MISSING_NODE, :text_content, opts,
                   differences)
    return Comparison::MISSING_NODE
  end

  # Handle comment vs non-comment comparisons specially
  # When comparing a comment with a non-comment node (due to zip pairing),
  # create a :comments dimension difference instead of UNEQUAL_NODES_TYPES
  if comment_vs_non_comment_comparison?(node1, node2)
    match_opts = opts[:match_opts]
    comment_behavior = match_opts ? match_opts[:comments] : nil

    # Create a :comments dimension difference
    # The difference will be marked as normative or not based on the HtmlCompareProfile
    add_difference(node1, node2, Comparison::MISSING_NODE,
                   Comparison::MISSING_NODE, :comments, opts,
                   differences)

    # Return EQUIVALENT if comments are ignored, otherwise return UNEQUAL
    if comment_behavior == :ignore
      Comparison::EQUIVALENT
    else
      Comparison::UNEQUAL_COMMENTS
    end
  end

  # Check node types match
  unless same_node_type?(node1, node2)
    add_difference(node1, node2, Comparison::UNEQUAL_NODES_TYPES,
                   Comparison::UNEQUAL_NODES_TYPES, :text_content, opts,
                   differences)
    return Comparison::UNEQUAL_NODES_TYPES
  end

  # Dispatch based on node type
  dispatch_by_node_type(node1, node2, opts, child_opts, diff_children,
                        differences)
end

.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol

Dispatch comparison based on node type

Parameters:

  • node1 (Object)

    First node

  • node2 (Object)

    Second node

  • opts (Hash)

    Comparison options

  • child_opts (Hash)

    Options for child comparison

  • diff_children (Boolean)

    Whether to diff children

  • differences (Array)

    Array to append differences to

Returns:

  • (Symbol)

    Comparison result constant



181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/canon/comparison/xml_node_comparison.rb', line 181

def self.dispatch_by_node_type(node1, node2, opts, child_opts,
diff_children, differences)
  # Canon::Xml::Node types use .node_type method that returns symbols
  # Nokogiri also has .node_type but returns integers, so check for Symbol
  if node1.respond_to?(:node_type) && node2.respond_to?(:node_type) &&
      node1.node_type.is_a?(Symbol) && node2.node_type.is_a?(Symbol)
    dispatch_canon_node_type(node1, node2, opts, child_opts,
                             diff_children, differences)
  # Moxml/Nokogiri types use .element?, .text?, etc. methods
  else
    dispatch_legacy_node_type(node1, node2, opts, child_opts,
                              diff_children, differences)
  end
end

.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object

Dispatch by Canon::Xml::Node type



356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
# File 'lib/canon/comparison/xml_node_comparison.rb', line 356

def self.dispatch_canon_node_type(node1, node2, opts, child_opts,
diff_children, differences)
  # Import XmlComparator to use its comparison methods
  require_relative "xml_comparator"

  case node1.node_type
  when :root
    XmlComparator.compare_children(node1, node2, opts, child_opts,
                                   diff_children, differences)
  when :element
    XmlComparator.compare_element_nodes(node1, node2, opts, child_opts,
                                        diff_children, differences)
  when :text
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  when :comment
    XmlComparator.compare_comment_nodes(node1, node2, opts, differences)
  when :cdata
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  when :processing_instruction
    XmlComparator.compare_processing_instruction_nodes(node1, node2,
                                                       opts, differences)
  else
    Comparison::EQUIVALENT
  end
end

.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object

Dispatch by legacy Nokogiri/Moxml node type



383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
# File 'lib/canon/comparison/xml_node_comparison.rb', line 383

def self.dispatch_legacy_node_type(node1, node2, opts, child_opts,
diff_children, differences)
  # Import XmlComparator to use its comparison methods
  require_relative "xml_comparator"

  if node1.respond_to?(:element?) && node1.element?
    XmlComparator.compare_element_nodes(node1, node2, opts, child_opts,
                                        diff_children, differences)
  elsif node1.respond_to?(:text?) && node1.text?
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  elsif node1.respond_to?(:comment?) && node1.comment?
    XmlComparator.compare_comment_nodes(node1, node2, opts, differences)
  elsif node1.respond_to?(:cdata?) && node1.cdata?
    XmlComparator.compare_text_nodes(node1, node2, opts, differences)
  elsif node1.respond_to?(:processing_instruction?) && node1.processing_instruction?
    XmlComparator.compare_processing_instruction_nodes(node1, node2,
                                                       opts, differences)
  elsif node1.respond_to?(:root)
    XmlComparator.compare_document_nodes(node1, node2, opts, child_opts,
                                         diff_children, differences)
  else
    Comparison::EQUIVALENT
  end
end

.filter_children(children, opts) ⇒ Array

Filter children based on options

Removes nodes that should be excluded from comparison based on options like :ignore_nodes, :ignore_comments, etc.

Parameters:

  • children (Array)

    Array of child nodes

  • opts (Hash)

    Comparison options

Returns:

  • (Array)

    Filtered array of children



87
88
89
90
91
# File 'lib/canon/comparison/xml_node_comparison.rb', line 87

def self.filter_children(children, opts)
  children.reject do |child|
    node_excluded?(child, opts)
  end
end

.node_excluded?(node, opts) ⇒ Boolean

Check if a node should be excluded from comparison

Parameters:

  • node (Object)

    Node to check

  • opts (Hash)

    Comparison options

Returns:

  • (Boolean)

    true if node should be excluded



203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'lib/canon/comparison/xml_node_comparison.rb', line 203

def self.node_excluded?(node, opts)
  return false if node.nil?

  return true if opts[:ignore_nodes]&.include?(node)
  return true if opts[:ignore_comments] && comment_node?(node)
  return true if opts[:ignore_text_nodes] && text_node?(node)

  # Check match options
  match_opts = opts[:match_opts]
  return false unless match_opts

  # Filter comments based on match options and format
  # HTML: Filter comments to avoid spurious differences from zip pairing
  #       BUT only when not in verbose mode (verbose needs differences recorded)
  # XML: Don't filter comments (allow informative differences to be recorded)
  if match_opts[:comments] == :ignore && comment_node?(node)
    # In verbose mode, don't filter comments - we want to record the differences
    return false if opts[:verbose]

    # Only filter comments for HTML, not XML (when not verbose)
    format = opts[:format] || match_opts[:format]
    if %i[html html4 html5].include?(format)
      return true
    end
  end

  # Strip whitespace-only text nodes based on parent element configuration.
  # Use preserve_whitespace_elements / strip_whitespace_elements to control.
  # Blacklist (strip) > preserve > collapse > format defaults.
  return false unless text_node?(node) && node.parent
  return false unless MatchOptions.normalize_text(node_text(node)).empty?

  return true unless WhitespaceSensitivity.whitespace_preserved?(
    node.parent, match_opts
  )

  # When the pretty-print-side flag is active (set by opts_for_side in
  # ChildComparison.compare), drop whitespace-only text nodes that start
  # with "\n" inside :collapse elements — they are structural indentation
  # from the pretty-printer, not content.  Space-only nodes (no "\n") are
  # real inline content and are kept for normalised comparison.
  # :preserve elements are always left unchanged.
  if match_opts[:_pretty_print_side_active]
    ws_class = WhitespaceSensitivity.classify_text_node(node, opts)
    return true if ws_class == :collapse && node_text(node).start_with?("\n")
  end

  false
end

.node_text(node) ⇒ String

Extract text content from a node

Parameters:

  • node (Object)

    Node to extract text from

Returns:

  • (String)

    Text content



341
342
343
344
345
346
347
348
349
350
351
352
353
# File 'lib/canon/comparison/xml_node_comparison.rb', line 341

def self.node_text(node)
  return "" unless node

  if node.respond_to?(:content)
    node.content.to_s
  elsif node.respond_to?(:text)
    node.text.to_s
  elsif node.respond_to?(:value)
    node.value.to_s
  else
    ""
  end
end

.opts_for_side(opts, side) ⇒ Hash

Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.

When pretty_printed_expected (side :expected) or pretty_printed_received (side :received) is truthy in match_opts, returns a shallow copy of opts with an ephemeral _pretty_print_side_active: true flag merged into :match_opts. Otherwise returns opts unchanged (no allocation overhead).

The flag is consumed by node_excluded? to drop whitespace-only text nodes that start with “n” in :normalize whitespace elements. It is intentionally NOT propagated to recursive compare_nodes calls —each level of ChildComparison.compare re-evaluates it from the original pretty_printed_* flags.

Parameters:

  • opts (Hash)

    Full comparison options hash

  • side (Symbol)

    :expected or :received

Returns:

  • (Hash)

    opts copy with ephemeral flag, or opts itself



111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/canon/comparison/xml_node_comparison.rb', line 111

def self.opts_for_side(opts, side)
  match_opts = opts[:match_opts]
  return opts unless match_opts

  active = case side
           when :expected then match_opts[:pretty_printed_expected]
           when :received then match_opts[:pretty_printed_received]
           else false
           end

  return opts unless active

  opts.merge(match_opts: match_opts.merge(_pretty_print_side_active: true))
end

.same_node_type?(node1, node2) ⇒ Boolean

Check if two nodes are of the same type

Parameters:

  • node1 (Object)

    First node

  • node2 (Object)

    Second node

Returns:

  • (Boolean)

    true if nodes are same type



275
276
277
278
279
280
281
282
283
284
# File 'lib/canon/comparison/xml_node_comparison.rb', line 275

def self.same_node_type?(node1, node2)
  return false if node1.class != node2.class

  # For Nokogiri/Canon::Xml nodes, check node type
  if node1.respond_to?(:node_type) && node2.respond_to?(:node_type)
    node1.node_type == node2.node_type
  else
    true
  end
end

.serialize_node_to_xml(node) ⇒ String

Serialize a Canon::Xml::Node to XML string

This utility method handles serialization of different node types to their string representation for display and debugging purposes.

Parameters:

Returns:

  • (String)

    XML string representation



433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
# File 'lib/canon/comparison/xml_node_comparison.rb', line 433

def self.serialize_node_to_xml(node)
  if node.is_a?(Canon::Xml::Nodes::RootNode)
    # Serialize all children of root
    node.children.map { |child| serialize_node_to_xml(child) }.join
  elsif node.is_a?(Canon::Xml::Nodes::ElementNode)
    # Serialize element with attributes and children
    attrs = node.attribute_nodes.map do |a|
      " #{a.name}=\"#{a.value}\""
    end.join
    children_xml = node.children.map do |c|
      serialize_node_to_xml(c)
    end.join

    if children_xml.empty?
      "<#{node.name}#{attrs}/>"
    else
      "<#{node.name}#{attrs}>#{children_xml}</#{node.name}>"
    end
  elsif node.is_a?(Canon::Xml::Nodes::TextNode)
    node.value
  elsif node.is_a?(Canon::Xml::Nodes::CommentNode)
    "<!--#{node.value}-->"
  elsif node.is_a?(Canon::Xml::Nodes::ProcessingInstructionNode)
    "<?#{node.target} #{node.data}?>"
  elsif node.respond_to?(:to_xml)
    node.to_xml
  else
    node.to_s
  end
end

.text_node?(node) ⇒ Boolean

Check if a node is a text node

Parameters:

  • node (Object)

    Node to check

Returns:

  • (Boolean)

    true if node is a text node



331
332
333
334
335
# File 'lib/canon/comparison/xml_node_comparison.rb', line 331

def self.text_node?(node)
  (node.respond_to?(:text?) && node.text? &&
    !node.respond_to?(:element?)) ||
    (node.respond_to?(:node_type) && node.node_type == :text)
end