Class: Kotoshu::Suggestions::Strategies::NgramStrategy

Inherits:
BaseStrategy
  • Object
show all
Defined in:
lib/kotoshu/suggestions/strategies/ngram_strategy.rb

Overview

N-gram suggestion strategy.

Generates suggestions by finding words with high n-gram similarity. N-grams are contiguous sequences of n characters.

Examples:

Creating an n-gram strategy

strategy = NgramStrategy.new(n: 3)
result = strategy.generate(context)

Instance Attribute Summary

Attributes inherited from BaseStrategy

#config, #name

Instance Method Summary collapse

Methods inherited from BaseStrategy

#calculate_ngram_similarity, #create_suggestion, #create_suggestion_set, #enabled?, #generate_ngrams, #get_config, #has_config?, #max_results, #priority, #to_s

Constructor Details

#initialize(name: :ngram, **config) ⇒ NgramStrategy

Create a new n-gram strategy.

Parameters:

  • name (String, Symbol) (defaults to: :ngram)

    Name of the strategy

  • config (Hash)

    Configuration options

Options Hash (**config):

  • n (Integer)

    N-gram size (default: 3)

  • min_similarity (Float)

    Minimum similarity threshold (0-1)

  • max_results (Integer)

    Maximum results to return



22
23
24
# File 'lib/kotoshu/suggestions/strategies/ngram_strategy.rb', line 22

def initialize(name: :ngram, **config)
  super(name: name, **config)
end

Instance Method Details

#generate(context) ⇒ SuggestionSet

Generate suggestions based on n-gram similarity.

Parameters:

  • context (Context)

    The suggestion context

Returns:



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/kotoshu/suggestions/strategies/ngram_strategy.rb', line 30

def generate(context)
  word = context.word
  n = get_config(:n, 3)
  min_sim = get_config(:min_similarity, 0.3)
  min_typo_similarity = get_config(:min_typo_similarity, 0.70)  # Filter by typo correction similarity

  return create_suggestion_set([]) if word.length < n

  all_words = dictionary_words(context)

  # Get n-grams for input word
  word_ngrams = extract_ngrams(word, n)

  # Calculate n-gram similarity for each dictionary word
  results = {}
  all_words.each do |dict_word|
    next if dict_word == word
    next if dict_word.length < n

    similarity = ngram_similarity(word_ngrams, dict_word, n)
    next if similarity < min_sim

    # Also check typo correction similarity for filtering
    typo_sim = calculate_ngram_similarity(word, dict_word)
    next if typo_sim < min_typo_similarity

    # Convert similarity to distance (higher similarity = lower distance)
    dist = ((1 - similarity) * 10).to_i
    next if dist.zero?

    results[dict_word] ||= dist
    results[dict_word] = dist if dist < results[dict_word]
  end

  # Convert to suggestions sorted by similarity
  sorted_words = results.sort_by { |_, dist| dist }.map(&:first)
  create_suggestion_set(sorted_words, distances: results, original_word: word)
end

#handles?(context) ⇒ Boolean

Check if this strategy should handle the context.

Parameters:

  • context (Context)

    The suggestion context

Returns:

  • (Boolean)

    True if the word needs correction



73
74
75
76
77
# File 'lib/kotoshu/suggestions/strategies/ngram_strategy.rb', line 73

def handles?(context)
  return false unless enabled?

  !dictionary_lookup(context, context.word)
end