Class: Canon::TreeDiff::Operations::OperationDetector

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/tree_diff/operations/operation_detector.rb

Overview

OperationDetector analyzes tree matching results to detect high-level semantic operations.

Based on research from XDiff, XyDiff, and JATS-diff, this detector identifies operations in three levels:

Level 1: Basic operations (INSERT, DELETE, UPDATE) Level 2: Structural operations (MOVE) Level 3: Semantic operations (MERGE, SPLIT, UPGRADE, DOWNGRADE)

Examples:

detector = OperationDetector.new(tree1, tree2, matching)
operations = detector.detect
operations.each { |op| puts op.inspect }

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(tree1, tree2, matching, match_options = {}) ⇒ OperationDetector

Initialize a new operation detector

Parameters:

  • tree1 (TreeNode)

    First tree root

  • tree2 (TreeNode)

    Second tree root

  • matching (Matching)

    Matching between trees

  • match_options (Hash) (defaults to: {})

    Match options for comparison



30
31
32
33
34
35
36
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 30

def initialize(tree1, tree2, matching, match_options = {})
  @tree1 = tree1
  @tree2 = tree2
  @matching = matching
  @match_options = match_options || {}
  @operations = []
end

Instance Attribute Details

#match_optionsObject (readonly)

Returns the value of attribute match_options.



22
23
24
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22

def match_options
  @match_options
end

#matchingObject (readonly)

Returns the value of attribute matching.



22
23
24
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22

def matching
  @matching
end

#operationsObject (readonly)

Returns the value of attribute operations.



22
23
24
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22

def operations
  @operations
end

#tree1Object (readonly)

Returns the value of attribute tree1.



22
23
24
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22

def tree1
  @tree1
end

#tree2Object (readonly)

Returns the value of attribute tree2.



22
23
24
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22

def tree2
  @tree2
end

Instance Method Details

#calculate_depth(node) ⇒ Integer

Calculate depth of a node in the tree

Parameters:

  • node (TreeNode)

    Node to calculate depth for

Returns:

  • (Integer)

    Depth (0 for root)



591
592
593
594
595
596
597
598
599
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 591

def calculate_depth(node)
  depth = 0
  current = node
  while current.parent
    depth += 1
    current = current.parent
  end
  depth
end

#collect_all_nodes(node) ⇒ Array<TreeNode>

Collect all nodes in a tree (depth-first)

Parameters:

  • node (TreeNode)

    Root node

Returns:

  • (Array<TreeNode>)

    All nodes



299
300
301
302
303
304
305
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 299

def collect_all_nodes(node)
  nodes = [node]
  node.children.each do |child|
    nodes.concat(collect_all_nodes(child))
  end
  nodes
end

#detectArray<Operation>

Detect all operations

Returns:



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 41

def detect
  @operations = []

  # Level 1: Basic operations
  detect_inserts
  detect_deletes
  detect_updates

  # Level 2: Structural operations
  detect_moves

  # Level 3: Semantic operations
  # These require more sophisticated pattern analysis
  detect_merges
  detect_splits
  detect_upgrades
  detect_downgrades

  @operations
end

#detect_changes(node1, node2) ⇒ Hash

Detect specific changes between two nodes

Parameters:

  • node1 (TreeNode)

    Original node

  • node2 (TreeNode)

    Modified node

Returns:

  • (Hash)

    Hash of changes



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 190

def detect_changes(node1, node2)
  changes = {}

  if node1.label != node2.label
    changes[:label] =
      { old: node1.label, new: node2.label }
  end

  # CRITICAL FIX: Use normalized text comparison based on match_options
  if !text_equivalent?(node1, node2)
    changes[:value] =
      { old: node1.value, new: node2.value }
  end

  # Detect attribute changes (values or order)
  attrs1 = node1.attributes
  attrs2 = node2.attributes

  # Check if attribute values differ (ignoring order)
  if attrs1.sort.to_h != attrs2.sort.to_h
    # Actual attribute value differences
    changes[:attributes] = {
      old: attrs1,
      new: attrs2,
    }
  end

  # Check if attribute order differs (independently)
  # This can coexist with attribute value differences
  # Only detect order differences when the same attributes exist in different order
  # AND when attribute_order mode is :strict
  attribute_order_mode = @match_options[:attribute_order] || :ignore
  if attribute_order_mode == :strict &&
      attrs1.keys.sort == attrs2.keys.sort &&
      attrs1.keys != attrs2.keys
    # Same attributes but in different order
    changes[:attribute_order] = {
      old: attrs1.keys,
      new: attrs2.keys,
    }
  end

  changes
end

#extract_text_content(node) ⇒ String

Extract all text content from a node and its descendants

Parameters:

  • node (TreeNode)

    Node to extract from

Returns:

  • (String)

    Combined text content



530
531
532
533
534
535
536
537
538
539
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 530

def extract_text_content(node)
  texts = []
  texts << node.value if node.value && !node.value.empty?

  node.children.each do |child|
    texts << extract_text_content(child)
  end

  texts.join(" ").strip
end

#nodes_identical?(node1, node2) ⇒ Boolean

Check if two nodes are identical

Parameters:

  • node1 (TreeNode)

    First node

  • node2 (TreeNode)

    Second node

Returns:

  • (Boolean)


179
180
181
182
183
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 179

def nodes_identical?(node1, node2)
  node1.label == node2.label &&
    node1.value == node2.value &&
    node1.attributes == node2.attributes
end

#normalize_text(text) ⇒ String

Normalize text for comparison

Collapses multiple whitespace into single space and strips. Also decodes XML entity references so that entity-encoded text (e.g., &#x201C;) and literal characters (e.g., “) that represent the same Unicode character compare as equivalent.

Parameters:

  • text (String, nil)

    Text to normalize

Returns:

  • (String)

    Normalized text



288
289
290
291
292
293
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 288

def normalize_text(text)
  return "" if text.nil? || text.empty?

  normalized = Core::XmlEntityDecoder.decode_xml_entities(text)
  normalized.gsub(/\s+/, " ").strip
end

#text_similarity(text1, text2) ⇒ Float

Calculate text similarity using Jaccard index

Parameters:

  • text1 (String)

    First text

  • text2 (String)

    Second text

Returns:

  • (Float)

    Similarity score (0.0 to 1.0)



574
575
576
577
578
579
580
581
582
583
584
585
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 574

def text_similarity(text1, text2)
  tokens1 = text1.downcase.split(/\s+/)
  tokens2 = text2.downcase.split(/\s+/)

  return 0.0 if tokens1.empty? && tokens2.empty?
  return 0.0 if tokens1.empty? || tokens2.empty?

  intersection = (tokens1 & tokens2).size
  union = (tokens1 | tokens2).size

  intersection.to_f / union
end