Module: Canon::Comparison
- Defined in:
- lib/canon/comparison.rb,
lib/canon/comparison/dimensions.rb,
lib/canon/comparison/xml_parser.rb,
lib/canon/comparison/html_parser.rb,
lib/canon/comparison/json_parser.rb,
lib/canon/comparison/match_options.rb,
lib/canon/comparison/node_inspector.rb,
lib/canon/comparison/xml_comparator.rb,
lib/canon/comparison/base_comparator.rb,
lib/canon/comparison/compare_profile.rb,
lib/canon/comparison/format_detector.rb,
lib/canon/comparison/html_comparator.rb,
lib/canon/comparison/json_comparator.rb,
lib/canon/comparison/yaml_comparator.rb,
lib/canon/comparison/comparison_result.rb,
lib/canon/comparison/markup_comparator.rb,
lib/canon/comparison/profile_definition.rb,
lib/canon/comparison/dimensions/registry.rb,
lib/canon/comparison/xml_node_comparison.rb,
lib/canon/comparison/html_compare_profile.rb,
lib/canon/comparison/ruby_object_comparator.rb,
lib/canon/comparison/whitespace_sensitivity.rb,
lib/canon/comparison/dimensions/base_dimension.rb,
lib/canon/comparison/match_options/xml_resolver.rb,
lib/canon/comparison/xml_comparator/node_parser.rb,
lib/canon/comparison/match_options/base_resolver.rb,
lib/canon/comparison/match_options/json_resolver.rb,
lib/canon/comparison/match_options/yaml_resolver.rb,
lib/canon/comparison/dimensions/comments_dimension.rb,
lib/canon/comparison/strategies/base_match_strategy.rb,
lib/canon/comparison/xml_comparator/attribute_filter.rb,
lib/canon/comparison/xml_comparator/child_comparison.rb,
lib/canon/comparison/xml_comparator/diff_node_builder.rb,
lib/canon/comparison/dimensions/text_content_dimension.rb,
lib/canon/comparison/strategies/match_strategy_factory.rb,
lib/canon/comparison/xml_comparator/attribute_comparator.rb,
lib/canon/comparison/xml_comparator/namespace_comparator.rb,
lib/canon/comparison/xml_comparator/node_type_comparator.rb,
lib/canon/comparison/dimensions/attribute_order_dimension.rb,
lib/canon/comparison/dimensions/attribute_values_dimension.rb,
lib/canon/comparison/dimensions/element_position_dimension.rb,
lib/canon/comparison/dimensions/attribute_presence_dimension.rb,
lib/canon/comparison/strategies/semantic_tree_match_strategy.rb,
lib/canon/comparison/dimensions/structural_whitespace_dimension.rb
Overview
Comparison module for XML, HTML, JSON, and YAML documents
This module provides a unified comparison API for multiple serialization formats. It auto-detects the format and delegates to specialized comparators while maintaining a CompareXML-compatible API.
Supported Formats
-
XML: Uses Moxml for parsing, supports namespaces
-
HTML: Uses Nokogiri, handles HTML4/HTML5 differences
-
JSON: Direct Ruby object comparison with deep equality
-
YAML: Parses to Ruby objects, compares semantically
Format Detection
The module automatically detects format from:
-
Object type (Moxml::Node, Nokogiri::HTML::Document, Hash, Array)
-
String content (DOCTYPE, opening tags, YAML/JSON syntax)
Comparison Options
Common options across all formats:
-
profile: Comparison profile (Symbol for preset, Hash for custom)
-
Presets: :strict, :rendered, :html4, :html5, :spec_friendly, :content_only
-
Custom: { text_content: :normalize, comments: :ignore, … }
-
-
diff_algorithm: Algorithm to use (:dom or :semantic, default: :dom)
-
verbose: Return detailed diff array (default: false)
Usage Examples
# XML comparison with default profile
Canon::Comparison.equivalent?(xml1, xml2)
# XML comparison with preset profile
Canon::Comparison.equivalent?(xml1, xml2, profile: :strict)
Canon::Comparison.equivalent?(xml1, xml2, profile: :spec_friendly)
# HTML comparison with custom inline profile
Canon::Comparison.equivalent?(html1, html2,
profile: { text_content: :normalize, comments: :ignore })
# Define and use a custom profile
Canon::Comparison.define_profile(:my_custom) do
text_content :normalize
comments :ignore
preprocessing :rendered
end
Canon::Comparison.equivalent?(doc1, doc2, profile: :my_custom)
# JSON comparison with semantic tree diff
Canon::Comparison.equivalent?(json1, json2,
diff_algorithm: :semantic, profile: :spec_friendly)
# With detailed output
diffs = Canon::Comparison.equivalent?(doc1, doc2, verbose: true)
diffs.each { |diff| puts diff.inspect }
XML Declaration Handling
XML declarations (‘<?xml version=“1.0” encoding=“UTF-8”?>`) are stripped during preprocessing for semantic comparison. This means:
-
Documents with and without declarations are considered equivalent
-
Declaration encoding differences are ignored
-
Entity declarations within DTD are resolved before comparison
This behavior ensures documents are compared by their content, not their serialization format.
Return Values
-
When verbose: false (default) → Boolean (true if equivalent)
-
When verbose: true → Array of difference hashes with details
Difference Hash Format
Each difference contains:
-
node1, node2: The nodes being compared (XML/HTML)
-
diff1, diff2: Comparison result codes
-
OR for JSON/YAML:
-
path: String path to the difference (e.g., “user.address.city”)
-
value1, value2: The differing values
-
diff_code: Type of difference
Defined Under Namespace
Modules: BaseComparator, Dimensions, MatchOptions, NodeInspector, RubyObjectComparator, Strategies, WhitespaceSensitivity, XmlComparatorHelpers, XmlNodeComparison Classes: CompareProfile, ComparisonResult, DiffNodeBuilder, FormatDetector, HtmlComparator, HtmlCompareProfile, HtmlParser, JsonComparator, JsonParser, MarkupComparator, ProfileDefinition, ProfileError, ResolvedMatchOptions, XmlComparator, XmlParser, YamlComparator
Constant Summary collapse
- EQUIVALENT =
Comparison result constants
1- MISSING_ATTRIBUTE =
2- MISSING_NODE =
3- UNEQUAL_ATTRIBUTES =
4- UNEQUAL_COMMENTS =
5- UNEQUAL_DOCUMENTS =
6- UNEQUAL_ELEMENTS =
7- UNEQUAL_NODES_TYPES =
8- UNEQUAL_TEXT_CONTENTS =
9- MISSING_HASH_KEY =
10- UNEQUAL_HASH_VALUES =
11- UNEQUAL_HASH_KEY_ORDER =
12- UNEQUAL_ARRAY_LENGTHS =
13- UNEQUAL_ARRAY_ELEMENTS =
14- UNEQUAL_TYPES =
15- UNEQUAL_PRIMITIVES =
16- CODE_LABELS =
Human-readable labels for the integer comparison-result constants above. Used by the diff reason builders so user-facing reason text never leaks raw numeric codes (e.g. “7 vs 7” — see lutaml/canon#127). String diff codes (e.g. “position 3” emitted by ChildComparison) pass through
code_labelunchanged. { EQUIVALENT => "equivalent", MISSING_ATTRIBUTE => "missing attribute", MISSING_NODE => "missing", UNEQUAL_ATTRIBUTES => "attributes differ", UNEQUAL_COMMENTS => "comments differ", UNEQUAL_DOCUMENTS => "documents differ", UNEQUAL_ELEMENTS => "elements differ", UNEQUAL_NODES_TYPES => "node types differ", UNEQUAL_TEXT_CONTENTS => "text content differs", MISSING_HASH_KEY => "missing hash key", UNEQUAL_HASH_VALUES => "hash values differ", UNEQUAL_HASH_KEY_ORDER => "hash key order differs", UNEQUAL_ARRAY_LENGTHS => "array lengths differ", UNEQUAL_ARRAY_ELEMENTS => "array elements differ", UNEQUAL_TYPES => "types differ", UNEQUAL_PRIMITIVES => "primitives differ", }.freeze
Class Method Summary collapse
-
.available_profiles ⇒ Array<Symbol>
List all available profiles (custom + presets).
-
.code_label(code) ⇒ String
Translate a comparison result code (Integer constant or String label like “position 3”) into a human-readable reason fragment.
-
.code_pair_label(diff1, diff2) ⇒ String
Build a “diff1 [vs diff2]” reason fragment that never leaks raw integer constants.
-
.define_profile(name) {|ProfileDefinition| ... } ⇒ Symbol
Define a custom comparison profile with DSL syntax.
-
.equivalent?(obj1, obj2, opts = {}) ⇒ Boolean, Array
Auto-detect format and compare two objects.
-
.load_profile(name) ⇒ Hash
Load a profile (custom or preset).
-
.parse_errors_for(node) ⇒ Array<String>
Extract parse-time errors from a parsed-tree or Nokogiri fragment.
-
.summarize(obj1, obj2, opts = {}) ⇒ String
Summarize the first difference between two documents.
Class Method Details
.available_profiles ⇒ Array<Symbol>
List all available profiles (custom + presets)
279 280 281 282 283 |
# File 'lib/canon/comparison.rb', line 279 def available_profiles custom = @custom_profiles&.keys || [] presets = MatchOptions::Xml::MATCH_PROFILES.keys (custom + presets).sort.uniq end |
.code_label(code) ⇒ String
Translate a comparison result code (Integer constant or String label like “position 3”) into a human-readable reason fragment. Unknown values pass through via to_s as a defensive fallback.
155 156 157 158 159 |
# File 'lib/canon/comparison.rb', line 155 def self.code_label(code) return code if code.is_a?(String) CODE_LABELS[code] || code.to_s end |
.code_pair_label(diff1, diff2) ⇒ String
Build a “diff1 [vs diff2]” reason fragment that never leaks raw integer constants. When both codes are equal, returns the single label (e.g. “elements differ”) rather than “elements differ vs elements differ”. See lutaml/canon#127.
169 170 171 172 173 |
# File 'lib/canon/comparison.rb', line 169 def self.code_pair_label(diff1, diff2) return code_label(diff1) if diff1 == diff2 "#{code_label(diff1)} vs #{code_label(diff2)}" end |
.define_profile(name) {|ProfileDefinition| ... } ⇒ Symbol
Define a custom comparison profile with DSL syntax
248 249 250 251 252 253 254 255 |
# File 'lib/canon/comparison.rb', line 248 def define_profile(name, &block) definition = ProfileDefinition.define(name, &block) @custom_profiles ||= {} @custom_profiles[name] = definition name end |
.equivalent?(obj1, obj2, opts = {}) ⇒ Boolean, Array
Auto-detect format and compare two objects
195 196 197 198 199 200 201 202 203 204 |
# File 'lib/canon/comparison.rb', line 195 def equivalent?(obj1, obj2, opts = {}) # Check if semantic tree diff is requested # Support both :semantic and :semantic_tree for backward compatibility if %i[semantic semantic_tree].include?(opts[:diff_algorithm]) return semantic_diff(obj1, obj2, opts) end # Otherwise use DOM-based comparison (default) dom_diff(obj1, obj2, opts) end |
.load_profile(name) ⇒ Hash
Load a profile (custom or preset)
261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
# File 'lib/canon/comparison.rb', line 261 def load_profile(name) # Check custom profiles first if @custom_profiles&.key?(name) return @custom_profiles[name].dup end # Fall back to presets - try Xml first (most common) begin MatchOptions::Xml.(name) rescue Error # Try other formats MatchOptions::Json.(name) end end |
.parse_errors_for(node) ⇒ Array<String>
Extract parse-time errors from a parsed-tree or Nokogiri fragment. Delegates to NodeInspector for cross-backend type dispatch.
180 181 182 |
# File 'lib/canon/comparison.rb', line 180 def self.parse_errors_for(node) NodeInspector.parse_errors(node) end |
.summarize(obj1, obj2, opts = {}) ⇒ String
Summarize the first difference between two documents.
Returns a human-readable string describing the first difference when documents differ, or “Equivalent” when they match. This is a lightweight alternative to equivalent? with verbose: true.
223 224 225 226 227 228 229 230 231 232 233 |
# File 'lib/canon/comparison.rb', line 223 def summarize(obj1, obj2, opts = {}) result = equivalent?(obj1, obj2, opts.merge(verbose: true)) if result.is_a?(ComparisonResult) result.summary elsif result == true "Equivalent" else "Not equivalent" end end |