Class: Kotoshu::Suggestions::Strategies::EditDistanceStrategy

Inherits:
BaseStrategy
  • Object
show all
Defined in:
lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb

Overview

Edit distance suggestion strategy with enhanced ranking. Generates suggestions by finding words with small edit distance, ranked by word frequency, keyboard proximity, and common typo patterns.

Multi-language support:

  • Automatically selects keyboard layout based on language_code

  • Loads frequency data from YAML files (Phase 1) or GitHub (Phase 2)

  • Supports language-specific typo patterns

This is MORE OOP than Spylls which uses standalone functions for edit distance operations.

Follows Open-Closed Principle: Extend by adding YAML files, NOT by modifying this class.

Instance Attribute Summary collapse

Attributes inherited from BaseStrategy

#config, #name

Instance Method Summary collapse

Methods inherited from BaseStrategy

#calculate_ngram_similarity, #create_suggestion, #create_suggestion_set, #enabled?, #generate_ngrams, #get_config, #has_config?, #max_results, #priority, #to_s

Constructor Details

#initialize(name: :edit_distance, language_code: 'en', keyboard_layout: nil, frequency_tiers: nil, **config) ⇒ EditDistanceStrategy

Returns a new instance of EditDistanceStrategy.

Parameters:

  • name (String, Symbol) (defaults to: :edit_distance)

    Name of the strategy

  • config (Hash)

    Configuration options

Options Hash (**config):

  • :language_code (String)

    Language code for keyboard layout (default: ‘en’)

  • :keyboard_layout (Keyboard::Layout)

    Custom keyboard layout (optional)

  • :frequency_tiers (Hash)

    Custom frequency tiers (optional)

  • :max_distance (Integer)

    Maximum edit distance (default: 2)

  • :max_results (Integer)

    Maximum results to return (default: 10)



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 36

def initialize(name: :edit_distance, language_code: 'en', keyboard_layout: nil,
               frequency_tiers: nil, **config)
  super(name: name, **config)
  @language_code = language_code

  # Use OOP registry for keyboard layout lookup
  @keyboard_layout = resolve_keyboard_layout(keyboard_layout)

  # Use custom frequency tiers if provided, otherwise load from Kelly data
  if frequency_tiers
    @frequency_tiers = frequency_tiers
    @common_words = Set.new
  else
    # Load frequency data for the language from Kelly JSON
    # This sets @frequency_tiers internally
    load_frequency_data(language_code)
  end
end

Instance Attribute Details

#keyboard_layoutObject (readonly)

Returns the value of attribute keyboard_layout.



27
28
29
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 27

def keyboard_layout
  @keyboard_layout
end

#language_codeObject (readonly)

Returns the value of attribute language_code.



27
28
29
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 27

def language_code
  @language_code
end

Instance Method Details

#adjacent_key_typo?(char1, char2) ⇒ Boolean

Check if a substitution is a keyboard-adjacent typo

Parameters:

  • char1 (String)

    First character

  • char2 (String)

    Second character

Returns:

  • (Boolean)

    True if keys are adjacent



74
75
76
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 74

def adjacent_key_typo?(char1, char2)
  @keyboard_layout.adjacent_keys(char1).include?(char2)
end

#adjacent_keys(key) ⇒ Array<String>

Get adjacent keys for a given key

Parameters:

  • key (String)

    The key to find adjacent keys for

Returns:

  • (Array<String>)

    List of adjacent key characters



82
83
84
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 82

def adjacent_keys(key)
  @keyboard_layout.adjacent_keys(key)
end

#frequency_bonus(word) ⇒ Integer

Get frequency bonus for a word

Parameters:

  • word (String)

    The word to check

Returns:

  • (Integer)

    Frequency bonus (0-200)



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 90

def frequency_bonus(word)
  return 0 unless @frequency_tiers

  word_downcase = word.downcase

  # Top 50: 200 bonus
  return 200 if @frequency_tiers[:top_50]&.include?(word_downcase)

  # Top 200: 100 bonus
  return 100 if @frequency_tiers[:top_200]&.include?(word_downcase)

  # Top 1000: 50 bonus
  return 50 if @frequency_tiers[:top_1000]&.include?(word_downcase)

  # Not in common words: no bonus
  0
end

#generate(context) ⇒ SuggestionSet

Generate suggestions based on enhanced edit distance scoring.

Scoring factors:

  • Edit distance (primary factor)

  • Word frequency (common words rank higher)

  • Keyboard proximity (adjacent key typos rank higher)

  • Common typo patterns (missing double letters, etc.)

Parameters:

  • context (Context)

    The suggestion context

Returns:



118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 118

def generate(context)
  word = context.word
  max_dist = get_config(:max_distance, 2)
  min_confidence = get_config(:min_confidence, 0.75)  # Higher threshold for quality
  min_similarity = get_config(:min_jaro_similarity, 0.70)  # Minimum Jaro-Winkler similarity (0.0-1.0)
  min_results = get_config(:min_results, 3)  # Always return at least 3 suggestions if available

  # Get all dictionary words
  all_words = dictionary_words(context)

  # Calculate enhanced scores for all candidates
  candidates = []
  all_words.each do |dict_word|
    next if dict_word == word

    dist = edit_distance(word, dict_word)
    next if dist > max_dist || dist <= 0

    # Calculate enhanced score (lower is better)
    score = calculate_enhanced_score(word, dict_word, dist)
    candidates << [dict_word, dist, score]
  end

  # Sort by enhanced score (lower is better)
  sorted_candidates = candidates.sort_by { |_, _, score| score }

  # Calculate confidence scores with threshold filtering
  if sorted_candidates.empty?
    return SuggestionSet.empty
  end

  max_score = sorted_candidates.map { |_, _, s| s.to_f }.max
  min_score = sorted_candidates.map { |_, _, s| s.to_f }.min
  score_range = (max_score - min_score).abs

  # Create suggestions with confidence-based filtering
  suggestions = []
  sorted_candidates.each do |dict_word, dist, score|
    # Normalize score to confidence (0.0 to 1.0)
    # Lower score = higher confidence
    if score_range > 0
      normalized = (score.to_f - min_score) / score_range  # 0 to 1
      confidence = 1.0 - normalized  # Invert: lower score = higher confidence
    else
      confidence = 1.0
    end

    # Calculate Jaro-Winkler similarity for additional filtering
    jaro_similarity = calculate_ngram_similarity(word, dict_word)

    # Skip low-confidence or low-similarity suggestions (unless we need more for min_results)
    if confidence < min_confidence || jaro_similarity < min_similarity
      next if suggestions.size >= min_results
    end

    suggestions << Suggestion.new(
      word: dict_word,
      distance: dist,
      confidence: confidence,
      source: @name,
      original_length: word.length,
      ngram_score: jaro_similarity,  # Now stores Jaro-Winkler similarity (0.0-1.0)
      enhanced_score: score
    )

    # Stop when we have enough high-quality suggestions
    break if suggestions.size >= max_results
  end

  SuggestionSet.new(suggestions, max_size: max_results)
end

#handles?(context) ⇒ Boolean

Check if this strategy should handle the context.

Parameters:

  • context (Context)

    The suggestion context

Returns:

  • (Boolean)

    True if the word needs correction



194
195
196
197
198
199
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 194

def handles?(context)
  return false unless enabled?

  # Only handle if the word is not in the dictionary
  !dictionary_lookup(context, context.word)
end

#keyboardKeyboard::Layout

Public method to get current keyboard being used

Returns:



58
59
60
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 58

def keyboard
  @keyboard_layout
end

#keyboard_nameString

Public method to get keyboard name

Returns:

  • (String)

    Keyboard layout name



65
66
67
# File 'lib/kotoshu/suggestions/strategies/edit_distance_strategy.rb', line 65

def keyboard_name
  @keyboard_layout.name
end