Module: Canon::Comparison::XmlNodeComparison
- Defined in:
- lib/canon/comparison/xml_node_comparison.rb
Overview
XML Node Comparison Utilities
Provides public comparison methods for XML/HTML nodes. This module extracts shared comparison logic that was previously accessed via send() from HtmlComparator.
This is a simple utility module with focused responsibilities.
Class Method Summary collapse
-
.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object
Add a difference to the differences array.
-
.comment_node?(node, check_children: false) ⇒ Boolean
Check if a node is a comment node.
-
.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean
Check if this is a comment vs non-comment comparison.
-
.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Compare document fragments by comparing their children.
-
.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Main comparison dispatcher for XML nodes.
-
.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Dispatch comparison based on node type.
-
.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by Canon::Xml::Node type.
-
.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by legacy Nokogiri/Moxml node type.
-
.filter_children(children, opts) ⇒ Array
Filter children based on options.
-
.node_excluded?(node, opts) ⇒ Boolean
Check if a node should be excluded from comparison.
-
.node_text(node) ⇒ String
Extract text content from a node.
-
.opts_for_side(opts, side) ⇒ Hash
Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.
-
.same_node_type?(node1, node2) ⇒ Boolean
Check if two nodes are of the same type.
-
.serialize_node_to_xml(node) ⇒ String
Serialize a Canon::Xml::Node to XML string.
-
.text_node?(node) ⇒ Boolean
Check if a node is a text node.
Class Method Details
.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object
Add a difference to the differences array
417 418 419 420 421 422 423 424 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 417 def self.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) return unless opts[:verbose] require_relative "xml_comparator" XmlComparator.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) end |
.comment_node?(node, check_children: false) ⇒ Boolean
Check if a node is a comment node
For XML/XHTML, this checks the node’s comment? method or node_type. For HTML, this also checks TEXT nodes that contain HTML-style comments (Nokogiri parses HTML comments as TEXT nodes with content like “<!– comment –>” or escaped like “<\!– comment –>” in full HTML documents).
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 296 def self.comment_node?(node, check_children: false) result = false return true if node.respond_to?(:comment?) && node.comment? return true if node.respond_to?(:node_type) && node.node_type == :comment if node.is_a?(Nokogiri::XML::Element) && !node.children.empty? && check_children node.children.each do |child| # Recursively check child nodes for comments # limit depth to avoid infinite recursion # in case of circular structures (if any) if comment_node?(child, check_children: false) result = true break end end end return true if result # HTML comments are parsed as TEXT nodes by Nokogiri # Check if this is a text node with HTML comment content if text_node?(node) text = node_text(node) # Strip whitespace and backslashes for comparison # Nokogiri escapes HTML comments as "<\\!-- comment -->" in full documents text_stripped = text.to_s.strip.gsub("\\", "") return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->") end result end |
.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean
Check if this is a comment vs non-comment comparison
This handles the case where zip pairs a comment with a non-comment node due to different lengths in the children arrays. We create a :comments dimension difference instead of UNEQUAL_NODES_TYPES.
262 263 264 265 266 267 268 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 262 def self.comment_vs_non_comment_comparison?(node1, node2) node1_comment = comment_node?(node1, check_children: true) node2_comment = comment_node?(node2, check_children: true) # XOR: exactly one is a comment node1_comment ^ node2_comment end |
.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Compare document fragments by comparing their children
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 135 def self.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) childrenode1 = node1.children.to_a childrenode2 = node2.children.to_a # Filter children before comparison to handle ignored nodes (like comments with :ignore). # Apply side-specific pretty-print heuristic when the relevant flag is active. children1 = filter_children(childrenode1, opts_for_side(opts, :expected)) children2 = filter_children(childrenode2, opts_for_side(opts, :received)) if children1.length != children2.length add_difference(node1, node2, Comparison::UNEQUAL_ELEMENTS, Comparison::UNEQUAL_ELEMENTS, :text_content, opts, differences) # Continue comparing children to find deeper differences like attribute values # Use zip to compare up to the shorter length end if children1.empty? && children2.empty? Comparison::EQUIVALENT else # Compare each pair of children (up to the shorter length) result = Comparison::EQUIVALENT children1.zip(children2).each do |child1, child2| # Skip if one is nil (due to different lengths) next if child1.nil? || child2.nil? child_result = compare_nodes(child1, child2, opts, child_opts, diff_children, differences) result = child_result unless result == Comparison::EQUIVALENT end result end end |
.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Main comparison dispatcher for XML nodes
This method handles the high-level comparison logic, delegating to specific comparison methods based on node types.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 25 def self.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) # Handle DocumentFragment nodes - compare their children instead if node1.is_a?(Nokogiri::XML::DocumentFragment) && node2.is_a?(Nokogiri::XML::DocumentFragment) return compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) end # Check if nodes should be excluded return Comparison::EQUIVALENT if node_excluded?(node1, opts) && node_excluded?(node2, opts) if node_excluded?(node1, opts) || node_excluded?(node2, opts) add_difference(node1, node2, Comparison::MISSING_NODE, Comparison::MISSING_NODE, :text_content, opts, differences) return Comparison::MISSING_NODE end # Handle comment vs non-comment comparisons specially # When comparing a comment with a non-comment node (due to zip pairing), # create a :comments dimension difference instead of UNEQUAL_NODES_TYPES if comment_vs_non_comment_comparison?(node1, node2) match_opts = opts[:match_opts] comment_behavior = match_opts ? match_opts[:comments] : nil # Create a :comments dimension difference # The difference will be marked as normative or not based on the HtmlCompareProfile add_difference(node1, node2, Comparison::MISSING_NODE, Comparison::MISSING_NODE, :comments, opts, differences) # Return EQUIVALENT if comments are ignored, otherwise return UNEQUAL if comment_behavior == :ignore Comparison::EQUIVALENT else Comparison::UNEQUAL_COMMENTS end end # Check node types match unless same_node_type?(node1, node2) add_difference(node1, node2, Comparison::UNEQUAL_NODES_TYPES, Comparison::UNEQUAL_NODES_TYPES, :text_content, opts, differences) return Comparison::UNEQUAL_NODES_TYPES end # Dispatch based on node type dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) end |
.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Dispatch comparison based on node type
181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 181 def self.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) # Canon::Xml::Node types use .node_type method that returns symbols # Nokogiri also has .node_type but returns integers, so check for Symbol if node1.respond_to?(:node_type) && node2.respond_to?(:node_type) && node1.node_type.is_a?(Symbol) && node2.node_type.is_a?(Symbol) dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) # Moxml/Nokogiri types use .element?, .text?, etc. methods else dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) end end |
.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by Canon::Xml::Node type
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 356 def self.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) # Import XmlComparator to use its comparison methods require_relative "xml_comparator" case node1.node_type when :root XmlComparator.compare_children(node1, node2, opts, child_opts, diff_children, differences) when :element XmlComparator.compare_element_nodes(node1, node2, opts, child_opts, diff_children, differences) when :text XmlComparator.compare_text_nodes(node1, node2, opts, differences) when :comment XmlComparator.compare_comment_nodes(node1, node2, opts, differences) when :cdata XmlComparator.compare_text_nodes(node1, node2, opts, differences) when :processing_instruction XmlComparator.compare_processing_instruction_nodes(node1, node2, opts, differences) else Comparison::EQUIVALENT end end |
.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by legacy Nokogiri/Moxml node type
383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 383 def self.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) # Import XmlComparator to use its comparison methods require_relative "xml_comparator" if node1.respond_to?(:element?) && node1.element? XmlComparator.compare_element_nodes(node1, node2, opts, child_opts, diff_children, differences) elsif node1.respond_to?(:text?) && node1.text? XmlComparator.compare_text_nodes(node1, node2, opts, differences) elsif node1.respond_to?(:comment?) && node1.comment? XmlComparator.compare_comment_nodes(node1, node2, opts, differences) elsif node1.respond_to?(:cdata?) && node1.cdata? XmlComparator.compare_text_nodes(node1, node2, opts, differences) elsif node1.respond_to?(:processing_instruction?) && node1.processing_instruction? XmlComparator.compare_processing_instruction_nodes(node1, node2, opts, differences) elsif node1.respond_to?(:root) XmlComparator.compare_document_nodes(node1, node2, opts, child_opts, diff_children, differences) else Comparison::EQUIVALENT end end |
.filter_children(children, opts) ⇒ Array
Filter children based on options
Removes nodes that should be excluded from comparison based on options like :ignore_nodes, :ignore_comments, etc.
87 88 89 90 91 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 87 def self.filter_children(children, opts) children.reject do |child| node_excluded?(child, opts) end end |
.node_excluded?(node, opts) ⇒ Boolean
Check if a node should be excluded from comparison
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 203 def self.node_excluded?(node, opts) return false if node.nil? return true if opts[:ignore_nodes]&.include?(node) return true if opts[:ignore_comments] && comment_node?(node) return true if opts[:ignore_text_nodes] && text_node?(node) # Check match options match_opts = opts[:match_opts] return false unless match_opts # Filter comments based on match options and format # HTML: Filter comments to avoid spurious differences from zip pairing # BUT only when not in verbose mode (verbose needs differences recorded) # XML: Don't filter comments (allow informative differences to be recorded) if match_opts[:comments] == :ignore && comment_node?(node) # In verbose mode, don't filter comments - we want to record the differences return false if opts[:verbose] # Only filter comments for HTML, not XML (when not verbose) format = opts[:format] || match_opts[:format] if %i[html html4 html5].include?(format) return true end end # Strip whitespace-only text nodes based on parent element configuration. # Use preserve_whitespace_elements / strip_whitespace_elements to control. # Blacklist (strip) > preserve > collapse > format defaults. return false unless text_node?(node) && node.parent return false unless MatchOptions.normalize_text(node_text(node)).empty? return true unless WhitespaceSensitivity.whitespace_preserved?( node.parent, match_opts ) # When the pretty-print-side flag is active (set by opts_for_side in # ChildComparison.compare), drop whitespace-only text nodes that start # with "\n" inside :collapse elements — they are structural indentation # from the pretty-printer, not content. Space-only nodes (no "\n") are # real inline content and are kept for normalised comparison. # :preserve elements are always left unchanged. if match_opts[:_pretty_print_side_active] ws_class = WhitespaceSensitivity.classify_text_node(node, opts) return true if ws_class == :collapse && node_text(node).start_with?("\n") end false end |
.node_text(node) ⇒ String
Extract text content from a node
341 342 343 344 345 346 347 348 349 350 351 352 353 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 341 def self.node_text(node) return "" unless node if node.respond_to?(:content) node.content.to_s elsif node.respond_to?(:text) node.text.to_s elsif node.respond_to?(:value) node.value.to_s else "" end end |
.opts_for_side(opts, side) ⇒ Hash
Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.
When pretty_printed_expected (side :expected) or pretty_printed_received (side :received) is truthy in match_opts, returns a shallow copy of opts with an ephemeral _pretty_print_side_active: true flag merged into :match_opts. Otherwise returns opts unchanged (no allocation overhead).
The flag is consumed by node_excluded? to drop whitespace-only text nodes that start with “n” in :normalize whitespace elements. It is intentionally NOT propagated to recursive compare_nodes calls —each level of ChildComparison.compare re-evaluates it from the original pretty_printed_* flags.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 111 def self.opts_for_side(opts, side) match_opts = opts[:match_opts] return opts unless match_opts active = case side when :expected then match_opts[:pretty_printed_expected] when :received then match_opts[:pretty_printed_received] else false end return opts unless active opts.merge(match_opts: match_opts.merge(_pretty_print_side_active: true)) end |
.same_node_type?(node1, node2) ⇒ Boolean
Check if two nodes are of the same type
275 276 277 278 279 280 281 282 283 284 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 275 def self.same_node_type?(node1, node2) return false if node1.class != node2.class # For Nokogiri/Canon::Xml nodes, check node type if node1.respond_to?(:node_type) && node2.respond_to?(:node_type) node1.node_type == node2.node_type else true end end |
.serialize_node_to_xml(node) ⇒ String
Serialize a Canon::Xml::Node to XML string
This utility method handles serialization of different node types to their string representation for display and debugging purposes.
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 433 def self.serialize_node_to_xml(node) if node.is_a?(Canon::Xml::Nodes::RootNode) # Serialize all children of root node.children.map { |child| serialize_node_to_xml(child) }.join elsif node.is_a?(Canon::Xml::Nodes::ElementNode) # Serialize element with attributes and children attrs = node.attribute_nodes.map do |a| " #{a.name}=\"#{a.value}\"" end.join children_xml = node.children.map do |c| serialize_node_to_xml(c) end.join if children_xml.empty? "<#{node.name}#{attrs}/>" else "<#{node.name}#{attrs}>#{children_xml}</#{node.name}>" end elsif node.is_a?(Canon::Xml::Nodes::TextNode) node.value elsif node.is_a?(Canon::Xml::Nodes::CommentNode) "<!--#{node.value}-->" elsif node.is_a?(Canon::Xml::Nodes::ProcessingInstructionNode) "<?#{node.target} #{node.data}?>" elsif node.respond_to?(:to_xml) node.to_xml else node.to_s end end |
.text_node?(node) ⇒ Boolean
Check if a node is a text node
331 332 333 334 335 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 331 def self.text_node?(node) (node.respond_to?(:text?) && node.text? && !node.respond_to?(:element?)) || (node.respond_to?(:node_type) && node.node_type == :text) end |