Class: Canon::TreeDiff::Operations::OperationDetector

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/tree_diff/operations/operation_detector.rb

Overview

OperationDetector analyzes tree matching results to detect high-level semantic operations.

Based on research from XDiff, XyDiff, and JATS-diff, this detector identifies operations in three levels:

Level 1: Basic operations (INSERT, DELETE, UPDATE) Level 2: Structural operations (MOVE) Level 3: Semantic operations (MERGE, SPLIT, UPGRADE, DOWNGRADE)

Examples:

detector = OperationDetector.new(tree1, tree2, matching)
operations = detector.detect
operations.each { |op| puts op.inspect }

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(tree1, tree2, matching, match_options = {}) ⇒ OperationDetector

Initialize a new operation detector

Parameters:

  • tree1 (TreeNode)

    First tree root

  • tree2 (TreeNode)

    Second tree root

  • matching (Matching)

    Matching between trees

  • match_options (Hash) (defaults to: {})

    Match options for comparison



32
33
34
35
36
37
38
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 32

def initialize(tree1, tree2, matching, match_options = {})
  @tree1 = tree1
  @tree2 = tree2
  @matching = matching
  @match_options = match_options || {}
  @operations = []
end

Instance Attribute Details

#match_optionsObject (readonly)

Returns the value of attribute match_options.



24
25
26
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24

def match_options
  @match_options
end

#matchingObject (readonly)

Returns the value of attribute matching.



24
25
26
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24

def matching
  @matching
end

#operationsObject (readonly)

Returns the value of attribute operations.



24
25
26
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24

def operations
  @operations
end

#tree1Object (readonly)

Returns the value of attribute tree1.



24
25
26
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24

def tree1
  @tree1
end

#tree2Object (readonly)

Returns the value of attribute tree2.



24
25
26
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24

def tree2
  @tree2
end

Instance Method Details

#calculate_depth(node) ⇒ Integer

Calculate depth of a node in the tree

Parameters:

  • node (TreeNode)

    Node to calculate depth for

Returns:

  • (Integer)

    Depth (0 for root)



593
594
595
596
597
598
599
600
601
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 593

def calculate_depth(node)
  depth = 0
  current = node
  while current.parent
    depth += 1
    current = current.parent
  end
  depth
end

#collect_all_nodes(node) ⇒ Array<TreeNode>

Collect all nodes in a tree (depth-first)

Parameters:

  • node (TreeNode)

    Root node

Returns:

  • (Array<TreeNode>)

    All nodes



301
302
303
304
305
306
307
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 301

def collect_all_nodes(node)
  nodes = [node]
  node.children.each do |child|
    nodes.concat(collect_all_nodes(child))
  end
  nodes
end

#detectArray<Operation>

Detect all operations

Returns:



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 43

def detect
  @operations = []

  # Level 1: Basic operations
  detect_inserts
  detect_deletes
  detect_updates

  # Level 2: Structural operations
  detect_moves

  # Level 3: Semantic operations
  # These require more sophisticated pattern analysis
  detect_merges
  detect_splits
  detect_upgrades
  detect_downgrades

  @operations
end

#detect_changes(node1, node2) ⇒ Hash

Detect specific changes between two nodes

Parameters:

  • node1 (TreeNode)

    Original node

  • node2 (TreeNode)

    Modified node

Returns:

  • (Hash)

    Hash of changes



192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 192

def detect_changes(node1, node2)
  changes = {}

  if node1.label != node2.label
    changes[:label] =
      { old: node1.label, new: node2.label }
  end

  # CRITICAL FIX: Use normalized text comparison based on match_options
  if !text_equivalent?(node1, node2)
    changes[:value] =
      { old: node1.value, new: node2.value }
  end

  # Detect attribute changes (values or order)
  attrs1 = node1.attributes
  attrs2 = node2.attributes

  # Check if attribute values differ (ignoring order)
  if attrs1.sort.to_h != attrs2.sort.to_h
    # Actual attribute value differences
    changes[:attributes] = {
      old: attrs1,
      new: attrs2,
    }
  end

  # Check if attribute order differs (independently)
  # This can coexist with attribute value differences
  # Only detect order differences when the same attributes exist in different order
  # AND when attribute_order mode is :strict
  attribute_order_mode = @match_options[:attribute_order] || :ignore
  if attribute_order_mode == :strict &&
      attrs1.keys.sort == attrs2.keys.sort &&
      attrs1.keys != attrs2.keys
    # Same attributes but in different order
    changes[:attribute_order] = {
      old: attrs1.keys,
      new: attrs2.keys,
    }
  end

  changes
end

#extract_text_content(node) ⇒ String

Extract all text content from a node and its descendants

Parameters:

  • node (TreeNode)

    Node to extract from

Returns:

  • (String)

    Combined text content



532
533
534
535
536
537
538
539
540
541
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 532

def extract_text_content(node)
  texts = []
  texts << node.value if node.value && !node.value.empty?

  node.children.each do |child|
    texts << extract_text_content(child)
  end

  texts.join(" ").strip
end

#nodes_identical?(node1, node2) ⇒ Boolean

Check if two nodes are identical

Parameters:

  • node1 (TreeNode)

    First node

  • node2 (TreeNode)

    Second node

Returns:

  • (Boolean)


181
182
183
184
185
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 181

def nodes_identical?(node1, node2)
  node1.label == node2.label &&
    node1.value == node2.value &&
    node1.attributes == node2.attributes
end

#normalize_text(text) ⇒ String

Normalize text for comparison

Collapses multiple whitespace into single space and strips. Also decodes XML entity references so that entity-encoded text (e.g., &#x201C;) and literal characters (e.g., “) that represent the same Unicode character compare as equivalent.

Parameters:

  • text (String, nil)

    Text to normalize

Returns:

  • (String)

    Normalized text



290
291
292
293
294
295
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 290

def normalize_text(text)
  return "" if text.nil? || text.empty?

  normalized = Core::XmlEntityDecoder.decode_xml_entities(text)
  normalized.gsub(/\s+/, " ").strip
end

#text_similarity(text1, text2) ⇒ Float

Calculate text similarity using Jaccard index

Parameters:

  • text1 (String)

    First text

  • text2 (String)

    Second text

Returns:

  • (Float)

    Similarity score (0.0 to 1.0)



576
577
578
579
580
581
582
583
584
585
586
587
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 576

def text_similarity(text1, text2)
  tokens1 = text1.downcase.split(/\s+/)
  tokens2 = text2.downcase.split(/\s+/)

  return 0.0 if tokens1.empty? && tokens2.empty?
  return 0.0 if tokens1.empty? || tokens2.empty?

  intersection = (tokens1 & tokens2).size
  union = (tokens1 | tokens2).size

  intersection.to_f / union
end