Class: Canon::TreeDiff::TreeDiffIntegrator

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/tree_diff/tree_diff_integrator.rb

Overview

TreeDiffIntegrator provides integration between Canon’s DOM diff system and the new semantic tree diff system.

This class orchestrates:

  • Format-specific adapter selection

  • Tree conversion from parsed documents

  • Tree matching via UniversalMatcher

  • Operation detection

  • Results formatting

Examples:

XML tree diff

integrator = TreeDiffIntegrator.new(format: :xml)
result = integrator.diff(doc1, doc2)
result[:operations] # => [Operation(...), ...]

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(format:, options: {}) ⇒ TreeDiffIntegrator

Initialize integrator for a specific format

Parameters:

  • format (Symbol)

    Format type (:xml, :json, :html, :yaml)

  • options (Hash) (defaults to: {})

    Configuration options (match options from Canon::Comparison)

Options Hash (options:):

  • :similarity_threshold (Float)

    Threshold for similarity matching (default: 0.95)

  • :hash_matching (Boolean)

    Enable hash matching phase (default: true)

  • :similarity_matching (Boolean)

    Enable similarity matching phase (default: true)

  • :propagation (Boolean)

    Enable propagation phase (default: true)

  • :text_content (Symbol)

    How to compare text (:strict, :normalize)

  • :attribute_order (Symbol)

    How to compare attributes (:strict, :ignore)



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 33

def initialize(format:, options: {})
  @format = format
  @options = options
  @match_options = options # Store full match options for downstream use

  # Initialize format-specific adapter WITH match options
  @adapter = create_adapter(format, options)

  # Initialize matcher with options
  matcher_options = {
    similarity_threshold: options[:similarity_threshold] || 0.95,
    hash_matching: options.fetch(:hash_matching, true),
    similarity_matching: options.fetch(:similarity_matching, true),
    propagation: options.fetch(:propagation, true),
    attribute_order: options[:attribute_order] || :ignore,
  }
  @matcher = Matchers::UniversalMatcher.new(matcher_options)
end

Instance Attribute Details

#adapterObject (readonly)

Returns the value of attribute adapter.



21
22
23
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 21

def adapter
  @adapter
end

#formatObject (readonly)

Returns the value of attribute format.



21
22
23
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 21

def format
  @format
end

#match_optionsObject (readonly)

Returns the value of attribute match_options.



21
22
23
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 21

def match_options
  @match_options
end

#matcherObject (readonly)

Returns the value of attribute matcher.



21
22
23
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 21

def matcher
  @matcher
end

Instance Method Details

#diff(doc1, doc2) ⇒ Hash

Perform tree diff on two documents

Parameters:

  • doc1 (Object)

    First document (format-specific)

  • doc2 (Object)

    Second document (format-specific)

Returns:

  • (Hash)

    Diff results with :operations, :matching, :statistics



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 57

def diff(doc1, doc2)
  tree1 = @adapter.to_tree(doc1)
  tree2 = @adapter.to_tree(doc2)

  # Filter comment nodes when comments are ignored to prevent
  # them from disrupting sibling alignment in the matcher
  if @match_options[:comments] == :ignore
    filter_comments_from_tree!(tree1)
    filter_comments_from_tree!(tree2)
  end

  check_node_count_limit(tree1)
  check_node_count_limit(tree2)

  # Match trees
  matching = @matcher.match(tree1, tree2)

  # Detect operations with match_options for proper normalization
  detector = Operations::OperationDetector.new(tree1, tree2, matching,
                                               @match_options)
  operations = detector.detect

  # Return comprehensive results
  {
    operations: operations,
    matching: matching,
    statistics: @matcher.statistics,
    trees: { tree1: tree1, tree2: tree2 },
  }
end

#equivalent?(doc1, doc2) ⇒ Boolean

Check if two documents are semantically equivalent

Parameters:

  • doc1 (Object)

    First document

  • doc2 (Object)

    Second document

Returns:

  • (Boolean)

    true if no operations detected



93
94
95
96
# File 'lib/canon/tree_diff/tree_diff_integrator.rb', line 93

def equivalent?(doc1, doc2)
  result = diff(doc1, doc2)
  result[:operations].empty?
end