Module: Canon::Comparison::XmlNodeComparison
- Defined in:
- lib/canon/comparison/xml_node_comparison.rb
Overview
XML Node Comparison Utilities
Provides public comparison methods for XML/HTML nodes. This module extracts shared comparison logic that was previously accessed via send() from HtmlComparator.
This is a simple utility module with focused responsibilities.
Class Method Summary collapse
-
.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object
Add a difference to the differences array.
-
.comment_node?(node, check_children: false) ⇒ Boolean
Check if a node is a comment node.
-
.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean
Check if this is a comment vs non-comment comparison.
-
.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Compare document fragments by comparing their children.
-
.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Main comparison dispatcher for XML nodes.
-
.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Dispatch comparison based on node type.
-
.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by Canon::Xml::Node type.
-
.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by legacy Nokogiri/Moxml node type.
-
.filter_children(children, opts) ⇒ Array
Filter children based on options.
-
.node_excluded?(node, opts) ⇒ Boolean
Check if a node should be excluded from comparison.
-
.node_text(node) ⇒ String
Extract text content from a node.
-
.opts_for_side(opts, side) ⇒ Hash
Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.
-
.same_node_type?(node1, node2) ⇒ Boolean
Check if two nodes are of the same type.
-
.serialize_node_to_xml(node) ⇒ String
Serialize a Canon::Xml::Node to XML string.
-
.text_node?(node) ⇒ Boolean
Check if a node is a text node.
Class Method Details
.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) ⇒ Object
Add a difference to the differences array
400 401 402 403 404 405 406 407 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 400 def self.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) return unless opts[:verbose] require_relative "xml_comparator" XmlComparator.add_difference(node1, node2, diff1, diff2, dimension, opts, differences) end |
.comment_node?(node, check_children: false) ⇒ Boolean
Check if a node is a comment node
For XML/XHTML, this checks the node’s comment? method or node_type. For HTML, this also checks TEXT nodes that contain HTML-style comments (Nokogiri parses HTML comments as TEXT nodes with content like “<!– comment –>” or escaped like “<\!– comment –>” in full HTML documents).
305 306 307 308 309 310 311 312 313 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 305 def self.comment_node?(node, check_children: false) return true if NodeInspector.comment_node?(node) if check_children && node.is_a?(Nokogiri::XML::Element) && !node.children.empty? node.children.any? { |child| NodeInspector.comment_node?(child) } else false end end |
.comment_vs_non_comment_comparison?(node1, node2) ⇒ Boolean
Check if this is a comment vs non-comment comparison
This handles the case where zip pairs a comment with a non-comment node due to different lengths in the children arrays. We create a :comments dimension difference instead of UNEQUAL_NODES_TYPES.
271 272 273 274 275 276 277 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 271 def self.comment_vs_non_comment_comparison?(node1, node2) node1_comment = comment_node?(node1, check_children: true) node2_comment = comment_node?(node2, check_children: true) # XOR: exactly one is a comment node1_comment ^ node2_comment end |
.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Compare document fragments by comparing their children
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 137 def self.compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) childrenode1 = node1.children.to_a childrenode2 = node2.children.to_a # Filter children before comparison to handle ignored nodes (like comments with :ignore). # Apply side-specific pretty-print heuristic when the relevant flag is active. children1 = filter_children(childrenode1, opts_for_side(opts, :expected)) children2 = filter_children(childrenode2, opts_for_side(opts, :received)) if children1.length != children2.length add_difference(node1, node2, Comparison::UNEQUAL_ELEMENTS, Comparison::UNEQUAL_ELEMENTS, :text_content, opts, differences) # Continue comparing children to find deeper differences like attribute values # Use zip to compare up to the shorter length end if children1.empty? && children2.empty? Comparison::EQUIVALENT else # Compare each pair of children (up to the shorter length) result = Comparison::EQUIVALENT children1.zip(children2).each do |child1, child2| # Skip if one is nil (due to different lengths) next if child1.nil? || child2.nil? child_result = compare_nodes(child1, child2, opts, child_opts, diff_children, differences) result = child_result unless result == Comparison::EQUIVALENT end result end end |
.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Main comparison dispatcher for XML nodes
This method handles the high-level comparison logic, delegating to specific comparison methods based on node types.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 27 def self.compare_nodes(node1, node2, opts, child_opts, diff_children, differences) # Handle DocumentFragment nodes - compare their children instead if node1.is_a?(Nokogiri::XML::DocumentFragment) && node2.is_a?(Nokogiri::XML::DocumentFragment) return compare_document_fragments(node1, node2, opts, child_opts, diff_children, differences) end # Check if nodes should be excluded return Comparison::EQUIVALENT if node_excluded?(node1, opts) && node_excluded?(node2, opts) if node_excluded?(node1, opts) || node_excluded?(node2, opts) add_difference(node1, node2, Comparison::MISSING_NODE, Comparison::MISSING_NODE, :text_content, opts, differences) return Comparison::MISSING_NODE end # Handle comment vs non-comment comparisons specially # When comparing a comment with a non-comment node (due to zip pairing), # create a :comments dimension difference instead of UNEQUAL_NODES_TYPES if comment_vs_non_comment_comparison?(node1, node2) match_opts = opts[:match_opts] comment_behavior = match_opts ? match_opts[:comments] : nil # Create a :comments dimension difference # The difference will be marked as normative or not based on the HtmlCompareProfile add_difference(node1, node2, Comparison::MISSING_NODE, Comparison::MISSING_NODE, :comments, opts, differences) # Return EQUIVALENT if comments are ignored, otherwise return UNEQUAL if comment_behavior == :ignore Comparison::EQUIVALENT else Comparison::UNEQUAL_COMMENTS end end # Check node types match unless same_node_type?(node1, node2) add_difference(node1, node2, Comparison::UNEQUAL_NODES_TYPES, Comparison::UNEQUAL_NODES_TYPES, :text_content, opts, differences) return Comparison::UNEQUAL_NODES_TYPES end # Dispatch based on node type dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) end |
.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Symbol
Dispatch comparison based on node type
183 184 185 186 187 188 189 190 191 192 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 183 def self.dispatch_by_node_type(node1, node2, opts, child_opts, diff_children, differences) if node1.is_a?(Canon::Xml::Node) && node2.is_a?(Canon::Xml::Node) dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) else dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) end end |
.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by Canon::Xml::Node type
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 334 def self.dispatch_canon_node_type(node1, node2, opts, child_opts, diff_children, differences) # Import XmlComparator to use its comparison methods require_relative "xml_comparator" case node1.node_type when :root XmlComparator.compare_children(node1, node2, opts, child_opts, diff_children, differences) when :element XmlComparator.compare_element_nodes(node1, node2, opts, child_opts, diff_children, differences) when :text XmlComparator.compare_text_nodes(node1, node2, opts, differences) when :comment XmlComparator.compare_comment_nodes(node1, node2, opts, differences) when :cdata XmlComparator.compare_text_nodes(node1, node2, opts, differences) when :processing_instruction XmlComparator.compare_processing_instruction_nodes(node1, node2, opts, differences) else Comparison::EQUIVALENT end end |
.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) ⇒ Object
Dispatch by legacy Nokogiri/Moxml node type
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 361 def self.dispatch_legacy_node_type(node1, node2, opts, child_opts, diff_children, differences) # Import XmlComparator to use its comparison methods require_relative "xml_comparator" case node1 when Nokogiri::XML::Document XmlComparator.compare_document_nodes(node1, node2, opts, child_opts, diff_children, differences) when Nokogiri::XML::Node if node1.element? XmlComparator.compare_element_nodes(node1, node2, opts, child_opts, diff_children, differences) elsif node1.text? XmlComparator.compare_text_nodes(node1, node2, opts, differences) elsif node1.comment? XmlComparator.compare_comment_nodes(node1, node2, opts, differences) elsif node1.cdata? XmlComparator.compare_text_nodes(node1, node2, opts, differences) elsif node1.processing_instruction? XmlComparator.compare_processing_instruction_nodes(node1, node2, opts, differences) else Comparison::EQUIVALENT end else Comparison::EQUIVALENT end end |
.filter_children(children, opts) ⇒ Array
Filter children based on options
Removes nodes that should be excluded from comparison based on options like :ignore_nodes, :ignore_comments, etc.
89 90 91 92 93 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 89 def self.filter_children(children, opts) children.reject do |child| node_excluded?(child, opts) end end |
.node_excluded?(node, opts) ⇒ Boolean
Check if a node should be excluded from comparison
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 201 def self.node_excluded?(node, opts) return false if node.nil? return true if opts[:ignore_nodes]&.include?(node) return true if opts[:ignore_comments] && comment_node?(node) return true if opts[:ignore_text_nodes] && text_node?(node) # Check match options match_opts = opts[:match_opts] return false unless match_opts # Filter comments based on match options and format # HTML: Filter comments to avoid spurious differences from zip pairing # BUT only when not in verbose mode (verbose needs differences recorded) # XML: Don't filter comments (allow informative differences to be recorded) if match_opts[:comments] == :ignore && comment_node?(node) # In verbose mode, don't filter comments - we want to record the differences return false if opts[:verbose] # Only filter comments for HTML, not XML (when not verbose) format = opts[:format] || match_opts[:format] if %i[html html4 html5].include?(format) return true end end # Strip whitespace-only text nodes based on parent element configuration. # Use preserve_whitespace_elements / strip_whitespace_elements to control. # Blacklist (strip) > preserve > collapse > format defaults. return false unless text_node?(node) && node.parent return false unless MatchOptions.normalize_text(node_text(node)).empty? # HTML-specific: NBSP (U+00A0) is never insignificant whitespace — # it always renders as a visible non-breaking space. format = opts[:format] || match_opts[:format] if %i[html html4 html5].include?(format) return false if WhitespaceSensitivity.contains_nbsp?(node_text(node)) # Whitespace between inline element siblings is semantically # significant (renders as a visible gap) and must not be stripped. return false if WhitespaceSensitivity.inline_whitespace_significant?(node) end return true unless WhitespaceSensitivity.whitespace_preserved?( node.parent, match_opts ) # When the pretty-print-side flag is active (set by opts_for_side in # ChildComparison.compare), drop whitespace-only text nodes that start # with "\n" inside :collapse elements — they are structural indentation # from the pretty-printer, not content. Space-only nodes (no "\n") are # real inline content and are kept for normalised comparison. # :preserve elements are always left unchanged. if match_opts[:_pretty_print_side_active] ws_class = WhitespaceSensitivity.classify_text_node(node, opts) return true if ws_class == :collapse && node_text(node).start_with?("\n") end false end |
.node_text(node) ⇒ String
Extract text content from a node
327 328 329 330 331 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 327 def self.node_text(node) return "" unless node NodeInspector.text_content(node) end |
.opts_for_side(opts, side) ⇒ Hash
Build a side-specific opts copy that activates the pretty-print structural-whitespace heuristic for the given side.
When pretty_printed_expected (side :expected) or pretty_printed_received (side :received) is truthy in match_opts, returns a shallow copy of opts with an ephemeral _pretty_print_side_active: true flag merged into :match_opts. Otherwise returns opts unchanged (no allocation overhead).
The flag is consumed by node_excluded? to drop whitespace-only text nodes that start with “n” in :normalize whitespace elements. It is intentionally NOT propagated to recursive compare_nodes calls —each level of ChildComparison.compare re-evaluates it from the original pretty_printed_* flags.
113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 113 def self.opts_for_side(opts, side) match_opts = opts[:match_opts] return opts unless match_opts active = case side when :expected then match_opts[:pretty_printed_expected] when :received then match_opts[:pretty_printed_received] else false end return opts unless active opts.merge(match_opts: match_opts.merge(_pretty_print_side_active: true)) end |
.same_node_type?(node1, node2) ⇒ Boolean
Check if two nodes are of the same type
284 285 286 287 288 289 290 291 292 293 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 284 def self.same_node_type?(node1, node2) return false if node1.class != node2.class case node1 when Canon::Xml::Node, Nokogiri::XML::Node node1.node_type == node2.node_type else true end end |
.serialize_node_to_xml(node) ⇒ String
Serialize a Canon::Xml::Node to XML string
This utility method handles serialization of different node types to their string representation for display and debugging purposes.
416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 416 def self.serialize_node_to_xml(node) case node when Canon::Xml::Nodes::RootNode # Serialize all children of root node.children.map { |child| serialize_node_to_xml(child) }.join when Canon::Xml::Nodes::ElementNode # Serialize element with attributes and children attrs = node.attribute_nodes.map do |a| " #{a.name}=\"#{a.value}\"" end.join children_xml = node.children.map do |c| serialize_node_to_xml(c) end.join if children_xml.empty? "<#{node.name}#{attrs}/>" else "<#{node.name}#{attrs}>#{children_xml}</#{node.name}>" end when Canon::Xml::Nodes::TextNode node.value when Canon::Xml::Nodes::CommentNode "<!--#{node.value}-->" when Canon::Xml::Nodes::ProcessingInstructionNode "<?#{node.target} #{node.data}?>" else node.to_s end end |
.text_node?(node) ⇒ Boolean
Check if a node is a text node
319 320 321 |
# File 'lib/canon/comparison/xml_node_comparison.rb', line 319 def self.text_node?(node) NodeInspector.text_node?(node) end |