Class: Canon::TreeDiff::Operations::OperationDetector
- Inherits:
-
Object
- Object
- Canon::TreeDiff::Operations::OperationDetector
- Defined in:
- lib/canon/tree_diff/operations/operation_detector.rb
Overview
OperationDetector analyzes tree matching results to detect high-level semantic operations.
Based on research from XDiff, XyDiff, and JATS-diff, this detector identifies operations in three levels:
Level 1: Basic operations (INSERT, DELETE, UPDATE) Level 2: Structural operations (MOVE) Level 3: Semantic operations (MERGE, SPLIT, UPGRADE, DOWNGRADE)
Instance Attribute Summary collapse
-
#match_options ⇒ Object
readonly
Returns the value of attribute match_options.
-
#matching ⇒ Object
readonly
Returns the value of attribute matching.
-
#operations ⇒ Object
readonly
Returns the value of attribute operations.
-
#tree1 ⇒ Object
readonly
Returns the value of attribute tree1.
-
#tree2 ⇒ Object
readonly
Returns the value of attribute tree2.
Instance Method Summary collapse
-
#calculate_depth(node) ⇒ Integer
Calculate depth of a node in the tree.
-
#collect_all_nodes(node) ⇒ Array<TreeNode>
Collect all nodes in a tree (depth-first).
-
#detect ⇒ Array<Operation>
Detect all operations.
-
#detect_changes(node1, node2) ⇒ Hash
Detect specific changes between two nodes.
-
#extract_text_content(node) ⇒ String
Extract all text content from a node and its descendants.
-
#initialize(tree1, tree2, matching, match_options = {}) ⇒ OperationDetector
constructor
Initialize a new operation detector.
-
#nodes_identical?(node1, node2) ⇒ Boolean
Check if two nodes are identical.
-
#normalize_text(text) ⇒ String
Normalize text for comparison.
-
#text_similarity(text1, text2) ⇒ Float
Calculate text similarity using Jaccard index.
Constructor Details
#initialize(tree1, tree2, matching, match_options = {}) ⇒ OperationDetector
Initialize a new operation detector
30 31 32 33 34 35 36 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 30 def initialize(tree1, tree2, matching, = {}) @tree1 = tree1 @tree2 = tree2 @matching = matching @match_options = || {} @operations = [] end |
Instance Attribute Details
#match_options ⇒ Object (readonly)
Returns the value of attribute match_options.
22 23 24 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22 def @match_options end |
#matching ⇒ Object (readonly)
Returns the value of attribute matching.
22 23 24 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22 def matching @matching end |
#operations ⇒ Object (readonly)
Returns the value of attribute operations.
22 23 24 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22 def operations @operations end |
#tree1 ⇒ Object (readonly)
Returns the value of attribute tree1.
22 23 24 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22 def tree1 @tree1 end |
#tree2 ⇒ Object (readonly)
Returns the value of attribute tree2.
22 23 24 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 22 def tree2 @tree2 end |
Instance Method Details
#calculate_depth(node) ⇒ Integer
Calculate depth of a node in the tree
591 592 593 594 595 596 597 598 599 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 591 def calculate_depth(node) depth = 0 current = node while current.parent depth += 1 current = current.parent end depth end |
#collect_all_nodes(node) ⇒ Array<TreeNode>
Collect all nodes in a tree (depth-first)
299 300 301 302 303 304 305 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 299 def collect_all_nodes(node) nodes = [node] node.children.each do |child| nodes.concat(collect_all_nodes(child)) end nodes end |
#detect ⇒ Array<Operation>
Detect all operations
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 41 def detect @operations = [] # Level 1: Basic operations detect_inserts detect_deletes detect_updates # Level 2: Structural operations detect_moves # Level 3: Semantic operations # These require more sophisticated pattern analysis detect_merges detect_splits detect_upgrades detect_downgrades @operations end |
#detect_changes(node1, node2) ⇒ Hash
Detect specific changes between two nodes
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 190 def detect_changes(node1, node2) changes = {} if node1.label != node2.label changes[:label] = { old: node1.label, new: node2.label } end # CRITICAL FIX: Use normalized text comparison based on match_options if !text_equivalent?(node1, node2) changes[:value] = { old: node1.value, new: node2.value } end # Detect attribute changes (values or order) attrs1 = node1.attributes attrs2 = node2.attributes # Check if attribute values differ (ignoring order) if attrs1.sort.to_h != attrs2.sort.to_h # Actual attribute value differences changes[:attributes] = { old: attrs1, new: attrs2, } end # Check if attribute order differs (independently) # This can coexist with attribute value differences # Only detect order differences when the same attributes exist in different order # AND when attribute_order mode is :strict attribute_order_mode = @match_options[:attribute_order] || :ignore if attribute_order_mode == :strict && attrs1.keys.sort == attrs2.keys.sort && attrs1.keys != attrs2.keys # Same attributes but in different order changes[:attribute_order] = { old: attrs1.keys, new: attrs2.keys, } end changes end |
#extract_text_content(node) ⇒ String
Extract all text content from a node and its descendants
530 531 532 533 534 535 536 537 538 539 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 530 def extract_text_content(node) texts = [] texts << node.value if node.value && !node.value.empty? node.children.each do |child| texts << extract_text_content(child) end texts.join(" ").strip end |
#nodes_identical?(node1, node2) ⇒ Boolean
Check if two nodes are identical
179 180 181 182 183 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 179 def nodes_identical?(node1, node2) node1.label == node2.label && node1.value == node2.value && node1.attributes == node2.attributes end |
#normalize_text(text) ⇒ String
Normalize text for comparison
Collapses multiple whitespace into single space and strips. Also decodes XML entity references so that entity-encoded text (e.g., “) and literal characters (e.g., “) that represent the same Unicode character compare as equivalent.
288 289 290 291 292 293 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 288 def normalize_text(text) return "" if text.nil? || text.empty? normalized = Core::XmlEntityDecoder.decode_xml_entities(text) normalized.gsub(/\s+/, " ").strip end |
#text_similarity(text1, text2) ⇒ Float
Calculate text similarity using Jaccard index
574 575 576 577 578 579 580 581 582 583 584 585 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 574 def text_similarity(text1, text2) tokens1 = text1.downcase.split(/\s+/) tokens2 = text2.downcase.split(/\s+/) return 0.0 if tokens1.empty? && tokens2.empty? return 0.0 if tokens1.empty? || tokens2.empty? intersection = (tokens1 & tokens2).size union = (tokens1 | tokens2).size intersection.to_f / union end |