Class: Lutaml::Xsd::Errors::Suggesters::FuzzyMatcher
- Inherits:
-
Object
- Object
- Lutaml::Xsd::Errors::Suggesters::FuzzyMatcher
- Defined in:
- lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb
Overview
Fuzzy string matching utility for finding similar items
Instance Attribute Summary collapse
-
#min_similarity ⇒ Float
readonly
Minimum similarity threshold (0.0 to 1.0).
-
#repository ⇒ SchemaRepository
readonly
The schema repository to search.
Instance Method Summary collapse
-
#find_similar_types(query, limit: 5) ⇒ Array<Suggestion>
Find types similar to the query string.
-
#initialize(repository, min_similarity: 0.6) ⇒ FuzzyMatcher
constructor
Initialize fuzzy matcher.
-
#levenshtein_distance(str1, str2) ⇒ Integer
Calculate Levenshtein distance between two strings.
-
#similarity_score(str1, str2) ⇒ Float
Calculate similarity score (0.0 to 1.0) based on Levenshtein distance.
Constructor Details
#initialize(repository, min_similarity: 0.6) ⇒ FuzzyMatcher
Initialize fuzzy matcher
25 26 27 28 |
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 25 def initialize(repository, min_similarity: 0.6) @repository = repository @min_similarity = min_similarity end |
Instance Attribute Details
#min_similarity ⇒ Float (readonly)
Returns Minimum similarity threshold (0.0 to 1.0).
19 20 21 |
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 19 def min_similarity @min_similarity end |
#repository ⇒ SchemaRepository (readonly)
Returns The schema repository to search.
16 17 18 |
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 16 def repository @repository end |
Instance Method Details
#find_similar_types(query, limit: 5) ⇒ Array<Suggestion>
Find types similar to the query string
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 35 def find_similar_types(query, limit: 5) return [] unless repository candidates = collect_type_candidates scored = score_candidates(candidates, query) filtered = scored.select { |_, score| score >= min_similarity } sorted = filtered.sort_by { |_, score| -score } sorted.take(limit).map do |name, score| Suggestion.new( text: name, similarity: score, explanation: "Did you mean '#{name}'?", ) end end |
#levenshtein_distance(str1, str2) ⇒ Integer
Calculate Levenshtein distance between two strings
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 57 def levenshtein_distance(str1, str2) return str2.length if str1.empty? return str1.length if str2.empty? matrix = Array.new(str1.length + 1) { Array.new(str2.length + 1) } (0..str1.length).each { |i| matrix[i][0] = i } (0..str2.length).each { |j| matrix[0][j] = j } (1..str1.length).each do |i| (1..str2.length).each do |j| cost = str1[i - 1] == str2[j - 1] ? 0 : 1 matrix[i][j] = [ matrix[i - 1][j] + 1, # deletion matrix[i][j - 1] + 1, # insertion matrix[i - 1][j - 1] + cost, # substitution ].min end end matrix[str1.length][str2.length] end |
#similarity_score(str1, str2) ⇒ Float
Calculate similarity score (0.0 to 1.0) based on Levenshtein distance
85 86 87 88 89 90 91 92 |
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 85 def similarity_score(str1, str2) return 1.0 if str1 == str2 return 0.0 if str1.empty? || str2.empty? distance = levenshtein_distance(str1.downcase, str2.downcase) max_length = [str1.length, str2.length].max 1.0 - (distance.to_f / max_length) end |