Module: Canon::Comparison

Defined in:
lib/canon/comparison.rb,
lib/canon/comparison/dimensions.rb,
lib/canon/comparison/xml_parser.rb,
lib/canon/comparison/html_parser.rb,
lib/canon/comparison/json_parser.rb,
lib/canon/comparison/match_options.rb,
lib/canon/comparison/node_inspector.rb,
lib/canon/comparison/xml_comparator.rb,
lib/canon/comparison/base_comparator.rb,
lib/canon/comparison/compare_profile.rb,
lib/canon/comparison/format_detector.rb,
lib/canon/comparison/html_comparator.rb,
lib/canon/comparison/json_comparator.rb,
lib/canon/comparison/yaml_comparator.rb,
lib/canon/comparison/comparison_result.rb,
lib/canon/comparison/markup_comparator.rb,
lib/canon/comparison/profile_definition.rb,
lib/canon/comparison/dimensions/registry.rb,
lib/canon/comparison/xml_node_comparison.rb,
lib/canon/comparison/html_compare_profile.rb,
lib/canon/comparison/ruby_object_comparator.rb,
lib/canon/comparison/whitespace_sensitivity.rb,
lib/canon/comparison/dimensions/base_dimension.rb,
lib/canon/comparison/match_options/xml_resolver.rb,
lib/canon/comparison/xml_comparator/node_parser.rb,
lib/canon/comparison/match_options/base_resolver.rb,
lib/canon/comparison/match_options/json_resolver.rb,
lib/canon/comparison/match_options/yaml_resolver.rb,
lib/canon/comparison/dimensions/comments_dimension.rb,
lib/canon/comparison/strategies/base_match_strategy.rb,
lib/canon/comparison/xml_comparator/attribute_filter.rb,
lib/canon/comparison/xml_comparator/child_comparison.rb,
lib/canon/comparison/xml_comparator/diff_node_builder.rb,
lib/canon/comparison/dimensions/text_content_dimension.rb,
lib/canon/comparison/strategies/match_strategy_factory.rb,
lib/canon/comparison/xml_comparator/attribute_comparator.rb,
lib/canon/comparison/xml_comparator/namespace_comparator.rb,
lib/canon/comparison/xml_comparator/node_type_comparator.rb,
lib/canon/comparison/dimensions/attribute_order_dimension.rb,
lib/canon/comparison/dimensions/attribute_values_dimension.rb,
lib/canon/comparison/dimensions/element_position_dimension.rb,
lib/canon/comparison/dimensions/attribute_presence_dimension.rb,
lib/canon/comparison/strategies/semantic_tree_match_strategy.rb,
lib/canon/comparison/dimensions/structural_whitespace_dimension.rb

Overview

Comparison module for XML, HTML, JSON, and YAML documents

This module provides a unified comparison API for multiple serialization formats. It auto-detects the format and delegates to specialized comparators while maintaining a CompareXML-compatible API.

Supported Formats

  • XML: Uses Moxml for parsing, supports namespaces

  • HTML: Uses Nokogiri, handles HTML4/HTML5 differences

  • JSON: Direct Ruby object comparison with deep equality

  • YAML: Parses to Ruby objects, compares semantically

Format Detection

The module automatically detects format from:

  • Object type (Moxml::Node, Nokogiri::HTML::Document, Hash, Array)

  • String content (DOCTYPE, opening tags, YAML/JSON syntax)

Comparison Options

Common options across all formats:

  • profile: Comparison profile (Symbol for preset, Hash for custom)

    • Presets: :strict, :rendered, :html4, :html5, :spec_friendly, :content_only

    • Custom: { text_content: :normalize, comments: :ignore, … }

  • diff_algorithm: Algorithm to use (:dom or :semantic, default: :dom)

  • verbose: Return detailed diff array (default: false)

Usage Examples

# XML comparison with default profile
Canon::Comparison.equivalent?(xml1, xml2)

# XML comparison with preset profile
Canon::Comparison.equivalent?(xml1, xml2, profile: :strict)
Canon::Comparison.equivalent?(xml1, xml2, profile: :spec_friendly)

# HTML comparison with custom inline profile
Canon::Comparison.equivalent?(html1, html2,
  profile: { text_content: :normalize, comments: :ignore })

# Define and use a custom profile
Canon::Comparison.define_profile(:my_custom) do
  text_content :normalize
  comments :ignore
  preprocessing :rendered
end
Canon::Comparison.equivalent?(doc1, doc2, profile: :my_custom)

# JSON comparison with semantic tree diff
Canon::Comparison.equivalent?(json1, json2,
  diff_algorithm: :semantic, profile: :spec_friendly)

# With detailed output
diffs = Canon::Comparison.equivalent?(doc1, doc2, verbose: true)
diffs.each { |diff| puts diff.inspect }

XML Declaration Handling

XML declarations (‘<?xml version=“1.0” encoding=“UTF-8”?>`) are stripped during preprocessing for semantic comparison. This means:

  • Documents with and without declarations are considered equivalent

  • Declaration encoding differences are ignored

  • Entity declarations within DTD are resolved before comparison

This behavior ensures documents are compared by their content, not their serialization format.

Return Values

  • When verbose: false (default) → Boolean (true if equivalent)

  • When verbose: true → Array of difference hashes with details

Difference Hash Format

Each difference contains:

  • node1, node2: The nodes being compared (XML/HTML)

  • diff1, diff2: Comparison result codes

  • OR for JSON/YAML:

  • path: String path to the difference (e.g., “user.address.city”)

  • value1, value2: The differing values

  • diff_code: Type of difference

Defined Under Namespace

Modules: BaseComparator, Dimensions, MatchOptions, NodeInspector, RubyObjectComparator, Strategies, WhitespaceSensitivity, XmlComparatorHelpers, XmlNodeComparison Classes: CompareProfile, ComparisonResult, DiffNodeBuilder, FormatDetector, HtmlComparator, HtmlCompareProfile, HtmlParser, JsonComparator, JsonParser, MarkupComparator, ProfileDefinition, ProfileError, ResolvedMatchOptions, XmlComparator, XmlParser, YamlComparator

Constant Summary collapse

EQUIVALENT =

Comparison result constants

1
MISSING_ATTRIBUTE =
2
MISSING_NODE =
3
UNEQUAL_ATTRIBUTES =
4
UNEQUAL_COMMENTS =
5
UNEQUAL_DOCUMENTS =
6
UNEQUAL_ELEMENTS =
7
UNEQUAL_NODES_TYPES =
8
UNEQUAL_TEXT_CONTENTS =
9
MISSING_HASH_KEY =
10
UNEQUAL_HASH_VALUES =
11
UNEQUAL_HASH_KEY_ORDER =
12
UNEQUAL_ARRAY_LENGTHS =
13
UNEQUAL_ARRAY_ELEMENTS =
14
UNEQUAL_TYPES =
15
UNEQUAL_PRIMITIVES =
16
CODE_LABELS =

Human-readable labels for the integer comparison-result constants above. Used by the diff reason builders so user-facing reason text never leaks raw numeric codes (e.g. “7 vs 7” — see lutaml/canon#127). String diff codes (e.g. “position 3” emitted by ChildComparison) pass through code_label unchanged.

{
  EQUIVALENT => "equivalent",
  MISSING_ATTRIBUTE => "missing attribute",
  MISSING_NODE => "missing",
  UNEQUAL_ATTRIBUTES => "attributes differ",
  UNEQUAL_COMMENTS => "comments differ",
  UNEQUAL_DOCUMENTS => "documents differ",
  UNEQUAL_ELEMENTS => "elements differ",
  UNEQUAL_NODES_TYPES => "node types differ",
  UNEQUAL_TEXT_CONTENTS => "text content differs",
  MISSING_HASH_KEY => "missing hash key",
  UNEQUAL_HASH_VALUES => "hash values differ",
  UNEQUAL_HASH_KEY_ORDER => "hash key order differs",
  UNEQUAL_ARRAY_LENGTHS => "array lengths differ",
  UNEQUAL_ARRAY_ELEMENTS => "array elements differ",
  UNEQUAL_TYPES => "types differ",
  UNEQUAL_PRIMITIVES => "primitives differ",
}.freeze

Class Method Summary collapse

Class Method Details

.available_profilesArray<Symbol>

List all available profiles (custom + presets)

Returns:

  • (Array<Symbol>)

    Available profile names



279
280
281
282
283
# File 'lib/canon/comparison.rb', line 279

def available_profiles
  custom = @custom_profiles&.keys || []
  presets = MatchOptions::Xml::MATCH_PROFILES.keys
  (custom + presets).sort.uniq
end

.code_label(code) ⇒ String

Translate a comparison result code (Integer constant or String label like “position 3”) into a human-readable reason fragment. Unknown values pass through via to_s as a defensive fallback.

Parameters:

  • code (Integer, String)

    Comparison result code

Returns:

  • (String)

    Human-readable label



155
156
157
158
159
# File 'lib/canon/comparison.rb', line 155

def self.code_label(code)
  return code if code.is_a?(String)

  CODE_LABELS[code] || code.to_s
end

.code_pair_label(diff1, diff2) ⇒ String

Build a “diff1 [vs diff2]” reason fragment that never leaks raw integer constants. When both codes are equal, returns the single label (e.g. “elements differ”) rather than “elements differ vs elements differ”. See lutaml/canon#127.

Parameters:

  • diff1 (Integer, String)

    First diff code

  • diff2 (Integer, String)

    Second diff code

Returns:

  • (String)

    Reason fragment



169
170
171
172
173
# File 'lib/canon/comparison.rb', line 169

def self.code_pair_label(diff1, diff2)
  return code_label(diff1) if diff1 == diff2

  "#{code_label(diff1)} vs #{code_label(diff2)}"
end

.define_profile(name) {|ProfileDefinition| ... } ⇒ Symbol

Define a custom comparison profile with DSL syntax

Examples:

Define a custom profile

Canon::Comparison.define_profile(:my_custom) do
  text_content :normalize
  comments :ignore
  preprocessing :rendered
end

Parameters:

  • name (Symbol)

    Profile name

Yields:

Returns:

  • (Symbol)

    Profile name

Raises:



248
249
250
251
252
253
254
255
# File 'lib/canon/comparison.rb', line 248

def define_profile(name, &block)
  definition = ProfileDefinition.define(name, &block)

  @custom_profiles ||= {}
  @custom_profiles[name] = definition

  name
end

.equivalent?(obj1, obj2, opts = {}) ⇒ Boolean, Array

Auto-detect format and compare two objects

Parameters:

  • obj1 (Object)

    First object to compare

  • obj2 (Object)

    Second object to compare

  • opts (Hash) (defaults to: {})

    Comparison options

    • :profile - Profile to use (Symbol for preset, Hash for custom)

    • :format - Format hint (:xml, :html, :html4, :html5, :json, :yaml, :string)

    • :diff_algorithm - Algorithm to use (:dom or :semantic)

    • :verbose - Return detailed diff array (default: false)

Returns:

  • (Boolean, Array)

    true if equivalent, or array of diffs if verbose



195
196
197
198
199
200
201
202
203
204
# File 'lib/canon/comparison.rb', line 195

def equivalent?(obj1, obj2, opts = {})
  # Check if semantic tree diff is requested
  # Support both :semantic and :semantic_tree for backward compatibility
  if %i[semantic semantic_tree].include?(opts[:diff_algorithm])
    return semantic_diff(obj1, obj2, opts)
  end

  # Otherwise use DOM-based comparison (default)
  dom_diff(obj1, obj2, opts)
end

.load_profile(name) ⇒ Hash

Load a profile (custom or preset)

Parameters:

  • name (Symbol)

    Profile name

Returns:

  • (Hash)

    Profile settings



261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/canon/comparison.rb', line 261

def load_profile(name)
  # Check custom profiles first
  if @custom_profiles&.key?(name)
    return @custom_profiles[name].dup
  end

  # Fall back to presets - try Xml first (most common)
  begin
    MatchOptions::Xml.get_profile_options(name)
  rescue Error
    # Try other formats
    MatchOptions::Json.get_profile_options(name)
  end
end

.parse_errors_for(node) ⇒ Array<String>

Extract parse-time errors from a parsed-tree or Nokogiri fragment. Delegates to NodeInspector for cross-backend type dispatch.

Parameters:

  • node (Object, nil)

    Parsed node

Returns:

  • (Array<String>)

    Parse errors as strings (empty by default)



180
181
182
# File 'lib/canon/comparison.rb', line 180

def self.parse_errors_for(node)
  NodeInspector.parse_errors(node)
end

.summarize(obj1, obj2, opts = {}) ⇒ String

Summarize the first difference between two documents.

Returns a human-readable string describing the first difference when documents differ, or “Equivalent” when they match. This is a lightweight alternative to equivalent? with verbose: true.

Examples:

Canon::Comparison.summarize("<p>Hello</p>", "<p>World</p>")
# => "Not equivalent: text content differs at /p[1] (Hello vs World)"

Canon::Comparison.summarize("<p>Hello</p>", "<p>Hello</p>")
# => "Equivalent"

Parameters:

  • obj1 (Object)

    First object to compare

  • obj2 (Object)

    Second object to compare

  • opts (Hash) (defaults to: {})

    Comparison options (same as equivalent?)

Returns:

  • (String)

    Summary string



223
224
225
226
227
228
229
230
231
232
233
# File 'lib/canon/comparison.rb', line 223

def summarize(obj1, obj2, opts = {})
  result = equivalent?(obj1, obj2, opts.merge(verbose: true))

  if result.is_a?(ComparisonResult)
    result.summary
  elsif result == true
    "Equivalent"
  else
    "Not equivalent"
  end
end