Class: Canon::TreeDiff::Operations::OperationDetector
- Inherits:
-
Object
- Object
- Canon::TreeDiff::Operations::OperationDetector
- Defined in:
- lib/canon/tree_diff/operations/operation_detector.rb
Overview
OperationDetector analyzes tree matching results to detect high-level semantic operations.
Based on research from XDiff, XyDiff, and JATS-diff, this detector identifies operations in three levels:
Level 1: Basic operations (INSERT, DELETE, UPDATE) Level 2: Structural operations (MOVE) Level 3: Semantic operations (MERGE, SPLIT, UPGRADE, DOWNGRADE)
Instance Attribute Summary collapse
-
#match_options ⇒ Object
readonly
Returns the value of attribute match_options.
-
#matching ⇒ Object
readonly
Returns the value of attribute matching.
-
#operations ⇒ Object
readonly
Returns the value of attribute operations.
-
#tree1 ⇒ Object
readonly
Returns the value of attribute tree1.
-
#tree2 ⇒ Object
readonly
Returns the value of attribute tree2.
Instance Method Summary collapse
-
#calculate_depth(node) ⇒ Integer
Calculate depth of a node in the tree.
-
#collect_all_nodes(node) ⇒ Array<TreeNode>
Collect all nodes in a tree (depth-first).
-
#detect ⇒ Array<Operation>
Detect all operations.
-
#detect_changes(node1, node2) ⇒ Hash
Detect specific changes between two nodes.
-
#extract_text_content(node) ⇒ String
Extract all text content from a node and its descendants.
-
#initialize(tree1, tree2, matching, match_options = {}) ⇒ OperationDetector
constructor
Initialize a new operation detector.
-
#nodes_identical?(node1, node2) ⇒ Boolean
Check if two nodes are identical.
-
#normalize_text(text) ⇒ String
Normalize text for comparison.
-
#text_similarity(text1, text2) ⇒ Float
Calculate text similarity using Jaccard index.
Constructor Details
#initialize(tree1, tree2, matching, match_options = {}) ⇒ OperationDetector
Initialize a new operation detector
32 33 34 35 36 37 38 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 32 def initialize(tree1, tree2, matching, = {}) @tree1 = tree1 @tree2 = tree2 @matching = matching @match_options = || {} @operations = [] end |
Instance Attribute Details
#match_options ⇒ Object (readonly)
Returns the value of attribute match_options.
24 25 26 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24 def @match_options end |
#matching ⇒ Object (readonly)
Returns the value of attribute matching.
24 25 26 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24 def matching @matching end |
#operations ⇒ Object (readonly)
Returns the value of attribute operations.
24 25 26 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24 def operations @operations end |
#tree1 ⇒ Object (readonly)
Returns the value of attribute tree1.
24 25 26 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24 def tree1 @tree1 end |
#tree2 ⇒ Object (readonly)
Returns the value of attribute tree2.
24 25 26 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 24 def tree2 @tree2 end |
Instance Method Details
#calculate_depth(node) ⇒ Integer
Calculate depth of a node in the tree
593 594 595 596 597 598 599 600 601 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 593 def calculate_depth(node) depth = 0 current = node while current.parent depth += 1 current = current.parent end depth end |
#collect_all_nodes(node) ⇒ Array<TreeNode>
Collect all nodes in a tree (depth-first)
301 302 303 304 305 306 307 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 301 def collect_all_nodes(node) nodes = [node] node.children.each do |child| nodes.concat(collect_all_nodes(child)) end nodes end |
#detect ⇒ Array<Operation>
Detect all operations
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 43 def detect @operations = [] # Level 1: Basic operations detect_inserts detect_deletes detect_updates # Level 2: Structural operations detect_moves # Level 3: Semantic operations # These require more sophisticated pattern analysis detect_merges detect_splits detect_upgrades detect_downgrades @operations end |
#detect_changes(node1, node2) ⇒ Hash
Detect specific changes between two nodes
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 192 def detect_changes(node1, node2) changes = {} if node1.label != node2.label changes[:label] = { old: node1.label, new: node2.label } end # CRITICAL FIX: Use normalized text comparison based on match_options if !text_equivalent?(node1, node2) changes[:value] = { old: node1.value, new: node2.value } end # Detect attribute changes (values or order) attrs1 = node1.attributes attrs2 = node2.attributes # Check if attribute values differ (ignoring order) if attrs1.sort.to_h != attrs2.sort.to_h # Actual attribute value differences changes[:attributes] = { old: attrs1, new: attrs2, } end # Check if attribute order differs (independently) # This can coexist with attribute value differences # Only detect order differences when the same attributes exist in different order # AND when attribute_order mode is :strict attribute_order_mode = @match_options[:attribute_order] || :ignore if attribute_order_mode == :strict && attrs1.keys.sort == attrs2.keys.sort && attrs1.keys != attrs2.keys # Same attributes but in different order changes[:attribute_order] = { old: attrs1.keys, new: attrs2.keys, } end changes end |
#extract_text_content(node) ⇒ String
Extract all text content from a node and its descendants
532 533 534 535 536 537 538 539 540 541 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 532 def extract_text_content(node) texts = [] texts << node.value if node.value && !node.value.empty? node.children.each do |child| texts << extract_text_content(child) end texts.join(" ").strip end |
#nodes_identical?(node1, node2) ⇒ Boolean
Check if two nodes are identical
181 182 183 184 185 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 181 def nodes_identical?(node1, node2) node1.label == node2.label && node1.value == node2.value && node1.attributes == node2.attributes end |
#normalize_text(text) ⇒ String
Normalize text for comparison
Collapses multiple whitespace into single space and strips. Also decodes XML entity references so that entity-encoded text (e.g., “) and literal characters (e.g., “) that represent the same Unicode character compare as equivalent.
290 291 292 293 294 295 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 290 def normalize_text(text) return "" if text.nil? || text.empty? normalized = Core::XmlEntityDecoder.decode_xml_entities(text) normalized.gsub(/\s+/, " ").strip end |
#text_similarity(text1, text2) ⇒ Float
Calculate text similarity using Jaccard index
576 577 578 579 580 581 582 583 584 585 586 587 |
# File 'lib/canon/tree_diff/operations/operation_detector.rb', line 576 def text_similarity(text1, text2) tokens1 = text1.downcase.split(/\s+/) tokens2 = text2.downcase.split(/\s+/) return 0.0 if tokens1.empty? && tokens2.empty? return 0.0 if tokens1.empty? || tokens2.empty? intersection = (tokens1 & tokens2).size union = (tokens1 | tokens2).size intersection.to_f / union end |