Class: Canon::TreeDiff::Matchers::SimilarityMatcher
- Inherits:
-
Object
- Object
- Canon::TreeDiff::Matchers::SimilarityMatcher
- Defined in:
- lib/canon/tree_diff/matchers/similarity_matcher.rb
Overview
SimilarityMatcher performs similarity-based matching
Based on JATS-diff (2022) approach:
-
Use Jaccard index for content similarity
-
Configurable similarity threshold (default 0.95)
-
Group candidates by signature for efficiency
-
Extend matches for unmatched nodes
Features:
-
Handles text-centric documents
-
Fuzzy matching for similar but not identical nodes
-
Threshold-based filtering
-
Efficient signature-based grouping
Instance Attribute Summary collapse
-
#matching ⇒ Object
readonly
Returns the value of attribute matching.
-
#threshold ⇒ Object
readonly
Returns the value of attribute threshold.
-
#tree1 ⇒ Object
readonly
Returns the value of attribute tree1.
-
#tree2 ⇒ Object
readonly
Returns the value of attribute tree2.
Instance Method Summary collapse
-
#initialize(tree1, tree2, matching, threshold: 0.95) ⇒ SimilarityMatcher
constructor
Initialize matcher with two trees and existing matching.
-
#match ⇒ Core::Matching
Perform similarity-based matching.
Constructor Details
#initialize(tree1, tree2, matching, threshold: 0.95) ⇒ SimilarityMatcher
Initialize matcher with two trees and existing matching
32 33 34 35 36 37 |
# File 'lib/canon/tree_diff/matchers/similarity_matcher.rb', line 32 def initialize(tree1, tree2, matching, threshold: 0.95) @tree1 = tree1 @tree2 = tree2 @matching = matching @threshold = threshold end |
Instance Attribute Details
#matching ⇒ Object (readonly)
Returns the value of attribute matching.
24 25 26 |
# File 'lib/canon/tree_diff/matchers/similarity_matcher.rb', line 24 def matching @matching end |
#threshold ⇒ Object (readonly)
Returns the value of attribute threshold.
24 25 26 |
# File 'lib/canon/tree_diff/matchers/similarity_matcher.rb', line 24 def threshold @threshold end |
#tree1 ⇒ Object (readonly)
Returns the value of attribute tree1.
24 25 26 |
# File 'lib/canon/tree_diff/matchers/similarity_matcher.rb', line 24 def tree1 @tree1 end |
#tree2 ⇒ Object (readonly)
Returns the value of attribute tree2.
24 25 26 |
# File 'lib/canon/tree_diff/matchers/similarity_matcher.rb', line 24 def tree2 @tree2 end |
Instance Method Details
#match ⇒ Core::Matching
Perform similarity-based matching
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/canon/tree_diff/matchers/similarity_matcher.rb', line 42 def match # Get unmatched nodes from both trees all_nodes1 = collect_nodes(tree1) all_nodes2 = collect_nodes(tree2) unmatched1 = @matching.unmatched1(all_nodes1) unmatched2 = @matching.unmatched2(all_nodes2) # Group unmatched nodes by signature for efficiency groups1 = group_by_signature(unmatched1) groups2 = group_by_signature(unmatched2) # For each signature group, find similar matches groups2.each do |sig, nodes2| # Find corresponding group in tree1 nodes1 = groups1[sig] || [] next if nodes1.empty? # Match nodes within this signature group match_group(nodes1, nodes2) end @matching end |