Class: Lutaml::Xsd::Errors::Suggesters::FuzzyMatcher

Inherits:
Object
  • Object
show all
Defined in:
lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb

Overview

Fuzzy string matching utility for finding similar items

Examples:

Finding similar types

matcher = FuzzyMatcher.new(repository)
similar = matcher.find_similar_types("CdeType", limit: 5)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(repository, min_similarity: 0.6) ⇒ FuzzyMatcher

Initialize fuzzy matcher

Parameters:

  • repository (SchemaRepository)

    The schema repository

  • min_similarity (Float) (defaults to: 0.6)

    Minimum similarity threshold (default: 0.6)



25
26
27
28
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 25

def initialize(repository, min_similarity: 0.6)
  @repository = repository
  @min_similarity = min_similarity
end

Instance Attribute Details

#min_similarityFloat (readonly)

Returns Minimum similarity threshold (0.0 to 1.0).

Returns:

  • (Float)

    Minimum similarity threshold (0.0 to 1.0)



19
20
21
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 19

def min_similarity
  @min_similarity
end

#repositorySchemaRepository (readonly)

Returns The schema repository to search.

Returns:



16
17
18
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 16

def repository
  @repository
end

Instance Method Details

#find_similar_types(query, limit: 5) ⇒ Array<Suggestion>

Find types similar to the query string

Parameters:

  • query (String)

    The query string

  • limit (Integer) (defaults to: 5)

    Maximum number of results (default: 5)

Returns:

  • (Array<Suggestion>)

    Similar types as suggestions



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 35

def find_similar_types(query, limit: 5)
  return [] unless repository

  candidates = collect_type_candidates
  scored = score_candidates(candidates, query)
  filtered = scored.select { |_, score| score >= min_similarity }
  sorted = filtered.sort_by { |_, score| -score }

  sorted.take(limit).map do |name, score|
    Suggestion.new(
      text: name,
      similarity: score,
      explanation: "Did you mean '#{name}'?",
    )
  end
end

#levenshtein_distance(str1, str2) ⇒ Integer

Calculate Levenshtein distance between two strings

Parameters:

  • str1 (String)

    First string

  • str2 (String)

    Second string

Returns:

  • (Integer)

    Edit distance



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 57

def levenshtein_distance(str1, str2)
  return str2.length if str1.empty?
  return str1.length if str2.empty?

  matrix = Array.new(str1.length + 1) { Array.new(str2.length + 1) }

  (0..str1.length).each { |i| matrix[i][0] = i }
  (0..str2.length).each { |j| matrix[0][j] = j }

  (1..str1.length).each do |i|
    (1..str2.length).each do |j|
      cost = str1[i - 1] == str2[j - 1] ? 0 : 1
      matrix[i][j] = [
        matrix[i - 1][j] + 1,      # deletion
        matrix[i][j - 1] + 1,      # insertion
        matrix[i - 1][j - 1] + cost, # substitution
      ].min
    end
  end

  matrix[str1.length][str2.length]
end

#similarity_score(str1, str2) ⇒ Float

Calculate similarity score (0.0 to 1.0) based on Levenshtein distance

Parameters:

  • str1 (String)

    First string

  • str2 (String)

    Second string

Returns:

  • (Float)

    Similarity score



85
86
87
88
89
90
91
92
# File 'lib/lutaml/xsd/errors/suggesters/fuzzy_matcher.rb', line 85

def similarity_score(str1, str2)
  return 1.0 if str1 == str2
  return 0.0 if str1.empty? || str2.empty?

  distance = levenshtein_distance(str1.downcase, str2.downcase)
  max_length = [str1.length, str2.length].max
  1.0 - (distance.to_f / max_length)
end