Class: Canon::Comparison::MarkupComparator
- Inherits:
-
Object
- Object
- Canon::Comparison::MarkupComparator
- Defined in:
- lib/canon/comparison/markup_comparator.rb
Overview
Base class for markup document comparison (XML, HTML)
Provides shared comparison functionality for markup documents, including node type checking, text extraction, filtering, and difference creation.
Format-specific comparators (XmlComparator, HtmlComparator) inherit from this class and add format-specific behavior.
Direct Known Subclasses
Class Method Summary collapse
-
.add_difference(node1, node2, diff1, diff2, dimension, _opts, differences) ⇒ Object
Add a difference to the differences array.
-
.build_attribute_difference_reason(attrs1, attrs2) ⇒ String
Build a clear reason message for attribute presence differences Shows which attributes are only in node1, only in node2, or different values.
-
.build_difference_reason(node1, node2, diff1, diff2, dimension) ⇒ String
Build a human-readable reason for a difference.
-
.build_path_for_node(node) ⇒ String?
Build canonical path for a node.
-
.build_text_difference_reason(text1, text2) ⇒ String
Build a clear reason message for text content differences Shows the actual text content (truncated if too long).
-
.comment_node?(node) ⇒ Boolean
Check if a node is a comment node.
-
.determine_node_dimension(node) ⇒ Symbol
Determine the appropriate dimension for a node type.
-
.enrich_diff_metadata(node1, node2) ⇒ Hash
Enrich DiffNode with canonical path, serialized content, and attributes This extracts presentation-ready metadata from nodes for Stage 4 rendering.
-
.extract_attributes(node) ⇒ Hash?
Extract attributes from a node.
-
.extract_text_content_from_node(node) ⇒ String?
Extract text content from a node for diff reason.
-
.filter_children(children, opts) ⇒ Array
Filter children based on options.
-
.node_excluded?(node, opts) ⇒ Boolean
Check if node should be excluded from comparison.
-
.node_text(node) ⇒ String
Get text content from a node.
-
.same_node_type?(node1, node2) ⇒ Boolean
Check if two nodes are the same type.
-
.serialize_element_node(node) ⇒ String
Serialize an element node to string.
-
.serialize_node(node) ⇒ String?
Serialize a node to string for display.
-
.text_node?(node) ⇒ Boolean
Check if a node is a text node.
-
.truncate_text(text, max_length = 40) ⇒ String
Truncate text for display in reason messages.
-
.whitespace_only_difference?(text1, text2) ⇒ Boolean
Check if difference between two texts is only whitespace.
Class Method Details
.add_difference(node1, node2, diff1, diff2, dimension, _opts, differences) ⇒ Object
Add a difference to the differences array
Creates a DiffNode with enriched metadata including path, serialized content, and attributes for Stage 4 rendering.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/canon/comparison/markup_comparator.rb', line 31 def add_difference(node1, node2, diff1, diff2, dimension, _opts, differences) # All differences must be DiffNode objects (OO architecture) if dimension.nil? raise ArgumentError, "dimension required for DiffNode" end # Build informative reason message reason = build_difference_reason(node1, node2, diff1, diff2, dimension) # Enrich with path, serialized content, and attributes for Stage 4 rendering = (node1, node2) diff_node = Canon::Diff::DiffNode.new( node1: node1, node2: node2, dimension: dimension, reason: reason, **, ) differences << diff_node end |
.build_attribute_difference_reason(attrs1, attrs2) ⇒ String
Build a clear reason message for attribute presence differences Shows which attributes are only in node1, only in node2, or different values
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
# File 'lib/canon/comparison/markup_comparator.rb', line 267 def build_attribute_difference_reason(attrs1, attrs2) return "#{attrs1&.keys&.size || 0} vs #{attrs2&.keys&.size || 0} attributes" unless attrs1 && attrs2 require "set" keys1 = attrs1.keys.to_set keys2 = attrs2.keys.to_set only_in_1 = keys1 - keys2 only_in_2 = keys2 - keys1 common = keys1 & keys2 # Check if values differ for common keys different_values = common.reject { |k| attrs1[k] == attrs2[k] } parts = [] parts << "only in first: #{only_in_1.to_a.sort.join(', ')}" if only_in_1.any? parts << "only in second: #{only_in_2.to_a.sort.join(', ')}" if only_in_2.any? parts << "different values: #{different_values.sort.join(', ')}" if different_values.any? if parts.empty? "#{keys1.size} vs #{keys2.size} attributes (same names)" else parts.join("; ") end end |
.build_difference_reason(node1, node2, diff1, diff2, dimension) ⇒ String
Build a human-readable reason for a difference
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
# File 'lib/canon/comparison/markup_comparator.rb', line 242 def build_difference_reason(node1, node2, diff1, diff2, dimension) # For attribute presence differences, show what attributes differ if dimension == :attribute_presence attrs1 = extract_attributes(node1) attrs2 = extract_attributes(node2) return build_attribute_difference_reason(attrs1, attrs2) end # For text content differences, show the actual text (truncated if needed) if dimension == :text_content text1 = extract_text_content_from_node(node1) text2 = extract_text_content_from_node(node2) return build_text_difference_reason(text1, text2) end # Default reason - can be overridden in subclasses "#{diff1} vs #{diff2}" end |
.build_path_for_node(node) ⇒ String?
Build canonical path for a node
76 77 78 79 80 |
# File 'lib/canon/comparison/markup_comparator.rb', line 76 def build_path_for_node(node) return nil if node.nil? Canon::Diff::PathBuilder.build(node, format: :document) end |
.build_text_difference_reason(text1, text2) ⇒ String
Build a clear reason message for text content differences Shows the actual text content (truncated if too long)
330 331 332 333 334 335 336 337 338 |
# File 'lib/canon/comparison/markup_comparator.rb', line 330 def build_text_difference_reason(text1, text2) # Handle nil cases return "missing vs '#{truncate_text(text2)}'" if text1.nil? && text2 return "'#{truncate_text(text1)}' vs missing" if text1 && text2.nil? return "both missing" if text1.nil? && text2.nil? # Both have content - show truncated versions "'#{truncate_text(text1)}' vs '#{truncate_text(text2)}'" end |
.comment_node?(node) ⇒ Boolean
Check if a node is a comment node
189 190 191 192 |
# File 'lib/canon/comparison/markup_comparator.rb', line 189 def comment_node?(node) node.respond_to?(:comment?) && node.comment? || node.respond_to?(:node_type) && node.node_type == :comment end |
.determine_node_dimension(node) ⇒ Symbol
Determine the appropriate dimension for a node type
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 |
# File 'lib/canon/comparison/markup_comparator.rb', line 375 def determine_node_dimension(node) # Canon::Xml::Node types if node.respond_to?(:node_type) && node.node_type.is_a?(Symbol) case node.node_type when :comment then :comments when :text, :cdata then :text_content when :processing_instruction then :processing_instructions else :text_content end # Moxml/Nokogiri types elsif node.respond_to?(:comment?) && node.comment? :comments elsif node.respond_to?(:text?) && node.text? :text_content elsif node.respond_to?(:cdata?) && node.cdata? :text_content elsif node.respond_to?(:processing_instruction?) && node.processing_instruction? :processing_instructions else :text_content end end |
.enrich_diff_metadata(node1, node2) ⇒ Hash
Enrich DiffNode with canonical path, serialized content, and attributes This extracts presentation-ready metadata from nodes for Stage 4 rendering
62 63 64 65 66 67 68 69 70 |
# File 'lib/canon/comparison/markup_comparator.rb', line 62 def (node1, node2) { path: build_path_for_node(node1 || node2), serialized_before: serialize_node(node1), serialized_after: serialize_node(node2), attributes_before: extract_attributes(node1), attributes_after: extract_attributes(node2), } end |
.extract_attributes(node) ⇒ Hash?
Extract attributes from a node
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/canon/comparison/markup_comparator.rb', line 114 def extract_attributes(node) return nil if node.nil? # Canon::Xml::Node ElementNode if node.is_a?(Canon::Xml::Nodes::ElementNode) node.attribute_nodes.each_with_object({}) do |attr, hash| hash[attr.name] = attr.value end # Nokogiri nodes elsif node.respond_to?(:attributes) node.attributes.each_with_object({}) do |(_, attr), hash| hash[attr.name] = attr.value end else {} end end |
.extract_text_content_from_node(node) ⇒ String?
Extract text content from a node for diff reason
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
# File 'lib/canon/comparison/markup_comparator.rb', line 297 def extract_text_content_from_node(node) return nil if node.nil? # For Canon::Xml::Nodes::TextNode return node.value if node.respond_to?(:value) && node.is_a?(Canon::Xml::Nodes::TextNode) # For XML/HTML nodes with text_content method return node.text_content if node.respond_to?(:text_content) # For nodes with text method return node.text if node.respond_to?(:text) # For nodes with content method (Moxml::Text) return node.content if node.respond_to?(:content) # For nodes with value method (other types) return node.value if node.respond_to?(:value) # For simple text nodes or strings return node.to_s if node.is_a?(String) # For other node types, try to_s node.to_s rescue StandardError nil end |
.filter_children(children, opts) ⇒ Array
Filter children based on options
Removes nodes that should be excluded from comparison based on options like :ignore_nodes, :ignore_comments, etc.
140 141 142 143 144 |
# File 'lib/canon/comparison/markup_comparator.rb', line 140 def filter_children(children, opts) children.reject do |child| node_excluded?(child, opts) end end |
.node_excluded?(node, opts) ⇒ Boolean
Check if node should be excluded from comparison
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/canon/comparison/markup_comparator.rb', line 151 def node_excluded?(node, opts) return false if node.nil? return true if opts[:ignore_nodes]&.include?(node) return true if opts[:ignore_comments] && comment_node?(node) return true if opts[:ignore_text_nodes] && text_node?(node) # Check structural_whitespace match option match_opts = opts[:match_opts] # Filter out whitespace-only text nodes if match_opts && %i[ignore normalize].include?(match_opts[:structural_whitespace]) && text_node?(node) text = node_text(node) return true if MatchOptions.normalize_text(text).empty? end false end |
.node_text(node) ⇒ String
Get text content from a node
208 209 210 211 212 213 214 215 216 217 218 |
# File 'lib/canon/comparison/markup_comparator.rb', line 208 def node_text(node) # Canon::Xml::Node TextNode uses .value if node.respond_to?(:value) node.value.to_s # Nokogiri nodes use .content elsif node.respond_to?(:content) node.content.to_s else node.to_s end end |
.same_node_type?(node1, node2) ⇒ Boolean
Check if two nodes are the same type
174 175 176 177 178 179 180 181 182 183 |
# File 'lib/canon/comparison/markup_comparator.rb', line 174 def same_node_type?(node1, node2) return false if node1.class != node2.class # For Nokogiri/Canon::Xml nodes, check node type if node1.respond_to?(:node_type) && node2.respond_to?(:node_type) node1.node_type == node2.node_type else true end end |
.serialize_element_node(node) ⇒ String
Serialize an element node to string
358 359 360 361 362 363 364 365 366 367 368 369 |
# File 'lib/canon/comparison/markup_comparator.rb', line 358 def serialize_element_node(node) attrs = node.attribute_nodes.map do |a| " #{a.name}=\"#{a.value}\"" end.join children_xml = node.children.map { |c| serialize_node(c) }.join if children_xml.empty? "<#{node.name}#{attrs}/>" else "<#{node.name}#{attrs}>#{children_xml}</#{node.name}>" end end |
.serialize_node(node) ⇒ String?
Serialize a node to string for display
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/canon/comparison/markup_comparator.rb', line 86 def serialize_node(node) return nil if node.nil? # Canon::Xml::Node types if node.is_a?(Canon::Xml::Nodes::RootNode) # Serialize all children of root node.children.map { |child| serialize_node(child) }.join elsif node.is_a?(Canon::Xml::Nodes::ElementNode) serialize_element_node(node) elsif node.is_a?(Canon::Xml::Nodes::TextNode) node.value elsif node.is_a?(Canon::Xml::Nodes::CommentNode) "<!--#{node.value}-->" elsif node.is_a?(Canon::Xml::Nodes::ProcessingInstructionNode) "<?#{node.target} #{node.data}?>" elsif node.respond_to?(:to_xml) node.to_xml elsif node.respond_to?(:to_html) node.to_html else node.to_s end end |
.text_node?(node) ⇒ Boolean
Check if a node is a text node
198 199 200 201 202 |
# File 'lib/canon/comparison/markup_comparator.rb', line 198 def text_node?(node) node.respond_to?(:text?) && node.text? && !node.respond_to?(:element?) || node.respond_to?(:node_type) && node.node_type == :text end |
.truncate_text(text, max_length = 40) ⇒ String
Truncate text for display in reason messages
345 346 347 348 349 350 351 352 |
# File 'lib/canon/comparison/markup_comparator.rb', line 345 def truncate_text(text, max_length = 40) return "" if text.nil? text = text.to_s return text if text.length <= max_length "#{text[0...max_length]}..." end |
.whitespace_only_difference?(text1, text2) ⇒ Boolean
Check if difference between two texts is only whitespace
225 226 227 228 229 230 231 232 |
# File 'lib/canon/comparison/markup_comparator.rb', line 225 def whitespace_only_difference?(text1, text2) # Normalize both texts (collapse/trim whitespace) norm1 = MatchOptions.normalize_text(text1) norm2 = MatchOptions.normalize_text(text2) # If normalized texts are the same, the difference was only whitespace norm1 == norm2 end |