Class: Kotoshu::Suggestions::Strategies::SemanticStrategy

Inherits:
BaseStrategy
  • Object
show all
Defined in:
lib/kotoshu/suggestions/strategies/semantic_strategy.rb

Overview

Semantic strategy using FastText ONNX embeddings.

Provides embedding-based spell correction for:

  • Typos: Re-ranks edit-distance candidates by semantic similarity

  • Real-word errors: Detects when valid words are used incorrectly in context

This strategy works alongside other strategies (EditDistance, Phonetic, etc.) to provide comprehensive spell checking with semantic awareness.

Examples:

Basic usage

strategy = SemanticStrategy.new(language_code: 'en')
suggestions = strategy.generate(context)

With preloaded embeddings (faster)

strategy = SemanticStrategy.new(
  language_code: 'en',
  preload_embeddings: true
)
suggestions = strategy.generate(context)

Instance Attribute Summary collapse

Attributes inherited from BaseStrategy

#config, #name

Instance Method Summary collapse

Methods inherited from BaseStrategy

#calculate_ngram_similarity, #create_suggestion, #create_suggestion_set, #enabled?, #generate_ngrams, #get_config, #has_config?, #max_results, #priority

Constructor Details

#initialize(language_code:, cache: nil, preload_embeddings: false, max_context_window: 5, min_semantic_similarity: 0.5, semantic_boost_weight: 0.3, **config) ⇒ SemanticStrategy

Create a new semantic strategy.

Parameters:

  • language_code (String)

    ISO 639-1 language code

  • cache (Cache::ModelCache, nil) (defaults to: nil)

    Optional cache instance

  • preload_embeddings (Boolean) (defaults to: false)

    Whether to preload embeddings

  • max_context_window (Integer) (defaults to: 5)

    Words to consider for context

  • min_semantic_similarity (Float) (defaults to: 0.5)

    Minimum similarity for semantic suggestions

  • semantic_boost_weight (Float) (defaults to: 0.3)

    Weight for semantic similarity in re-ranking

  • config (Hash)

    Additional configuration



52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 52

def initialize(language_code:, cache: nil, preload_embeddings: false,
               max_context_window: 5, min_semantic_similarity: 0.5,
               semantic_boost_weight: 0.3, **config)
  super(name: :semantic, **config)
  @language_code = language_code
  @max_context_window = max_context_window
  @min_semantic_similarity = min_semantic_similarity
  @semantic_boost_weight = semantic_boost_weight

  # Initialize embedding components
  initialize_embeddings(cache, preload_embeddings)
end

Instance Attribute Details

#language_codeString (readonly)

Returns Language code (ISO 639-1).

Returns:

  • (String)

    Language code (ISO 639-1)



32
33
34
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 32

def language_code
  @language_code
end

#modelEmbeddings::OnnxRuntimeModel (readonly)

Returns The ONNX model.

Returns:



38
39
40
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 38

def model
  @model
end

#searchEmbeddings::SimilaritySearch (readonly)

Returns The similarity search.

Returns:



41
42
43
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 41

def search
  @search
end

#vocabularyEmbeddings::Vocabulary (readonly)

Returns The vocabulary.

Returns:



35
36
37
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 35

def vocabulary
  @vocabulary
end

Instance Method Details

#embedding_for(word) ⇒ Array<Float>?

Get embedding for a word.

Parameters:

  • word (String)

    The word

Returns:

  • (Array<Float>, nil)

    Embedding vector or nil if not found



110
111
112
113
114
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 110

def embedding_for(word)
  return nil unless @search

  @search.send(:get_embedding, word)
end

#find_similar_words(word, k: 10) ⇒ Array<Hash>

Find semantically similar words.

Parameters:

  • word (String)

    The query word

  • k (Integer) (defaults to: 10)

    Number of neighbors

Returns:

  • (Array<Hash>)

    Array of similarity hashes



132
133
134
135
136
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 132

def find_similar_words(word, k: 10)
  return [] unless @search

  @search.find_nearest(word, k: k, exclude_self: false)
end

#generate(context) ⇒ SuggestionSet

Generate suggestions using semantic similarity.

Handles two cases:

  1. Word not in vocabulary (typo): Re-ranks edit-distance candidates

  2. Word in vocabulary (real-word error): Finds semantically similar alternatives

Parameters:

  • context (Context)

    The suggestion context

Returns:



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 73

def generate(context)
  word = context.word
  max_results = context.max_results || max_results

  # Ensure embeddings are loaded
  return SuggestionSet.empty unless @search

  # Case 1: Word not in vocabulary (typo)
  unless @vocabulary.include?(word)
    return generate_for_typo(context)
  end

  # Case 2: Real-word error detection
  # Find semantically similar words that might be correct in context
  generate_for_real_word_error(context)
end

#handles?(context) ⇒ Boolean

Check if this strategy should handle the context.

Semantic strategy handles:

  • Words not in vocabulary (for typo re-ranking)

  • Words in vocabulary (for real-word error detection)

Parameters:

  • context (Context)

    The suggestion context

Returns:

  • (Boolean)

    True if the strategy should handle this context



98
99
100
101
102
103
104
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 98

def handles?(context)
  return false unless enabled?
  return false unless @search && @vocabulary

  # Handle all words - we filter in generate()
  true
end

#semantic_similarity(word1, word2) ⇒ Float?

Compute semantic similarity between two words.

Parameters:

  • word1 (String)

    First word

  • word2 (String)

    Second word

Returns:

  • (Float, nil)

    Cosine similarity or nil if either word not found



121
122
123
124
125
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 121

def semantic_similarity(word1, word2)
  return nil unless @search

  @search.similarity(word1, word2)
end

#to_sString Also known as: inspect

String representation.

Returns:

  • (String)

    String representation



141
142
143
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 141

def to_s
  "SemanticStrategy(language: #{@language_code}, vocab_size: #{@vocabulary&.size || 0}, loaded: #{@search && true})"
end