Class: Kotoshu::Suggestions::Strategies::SemanticStrategy
- Inherits:
-
BaseStrategy
- Object
- BaseStrategy
- Kotoshu::Suggestions::Strategies::SemanticStrategy
- Defined in:
- lib/kotoshu/suggestions/strategies/semantic_strategy.rb
Overview
Semantic strategy using FastText ONNX embeddings.
Provides embedding-based spell correction for:
-
Typos: Re-ranks edit-distance candidates by semantic similarity
-
Real-word errors: Detects when valid words are used incorrectly in context
This strategy works alongside other strategies (EditDistance, Phonetic, etc.) to provide comprehensive spell checking with semantic awareness.
Instance Attribute Summary collapse
-
#language_code ⇒ String
readonly
Language code (ISO 639-1).
-
#model ⇒ Embeddings::OnnxRuntimeModel
readonly
The ONNX model.
-
#search ⇒ Embeddings::SimilaritySearch
readonly
The similarity search.
-
#vocabulary ⇒ Embeddings::Vocabulary
readonly
The vocabulary.
Attributes inherited from BaseStrategy
Instance Method Summary collapse
-
#embedding_for(word) ⇒ Array<Float>?
Get embedding for a word.
-
#find_similar_words(word, k: 10) ⇒ Array<Hash>
Find semantically similar words.
-
#generate(context) ⇒ SuggestionSet
Generate suggestions using semantic similarity.
-
#handles?(context) ⇒ Boolean
Check if this strategy should handle the context.
-
#initialize(language_code:, cache: nil, preload_embeddings: false, max_context_window: 5, min_semantic_similarity: 0.5, semantic_boost_weight: 0.3, **config) ⇒ SemanticStrategy
constructor
Create a new semantic strategy.
-
#semantic_similarity(word1, word2) ⇒ Float?
Compute semantic similarity between two words.
-
#to_s ⇒ String
(also: #inspect)
String representation.
Methods inherited from BaseStrategy
#calculate_ngram_similarity, #create_suggestion, #create_suggestion_set, #enabled?, #generate_ngrams, #get_config, #has_config?, #max_results, #priority
Constructor Details
#initialize(language_code:, cache: nil, preload_embeddings: false, max_context_window: 5, min_semantic_similarity: 0.5, semantic_boost_weight: 0.3, **config) ⇒ SemanticStrategy
Create a new semantic strategy.
52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 52 def initialize(language_code:, cache: nil, preload_embeddings: false, max_context_window: 5, min_semantic_similarity: 0.5, semantic_boost_weight: 0.3, **config) super(name: :semantic, **config) @language_code = language_code @max_context_window = max_context_window @min_semantic_similarity = min_semantic_similarity @semantic_boost_weight = semantic_boost_weight # Initialize embedding components (cache, ) end |
Instance Attribute Details
#language_code ⇒ String (readonly)
Returns Language code (ISO 639-1).
32 33 34 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 32 def language_code @language_code end |
#model ⇒ Embeddings::OnnxRuntimeModel (readonly)
Returns The ONNX model.
38 39 40 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 38 def model @model end |
#search ⇒ Embeddings::SimilaritySearch (readonly)
Returns The similarity search.
41 42 43 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 41 def search @search end |
#vocabulary ⇒ Embeddings::Vocabulary (readonly)
Returns The vocabulary.
35 36 37 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 35 def vocabulary @vocabulary end |
Instance Method Details
#embedding_for(word) ⇒ Array<Float>?
Get embedding for a word.
110 111 112 113 114 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 110 def (word) return nil unless @search @search.send(:get_embedding, word) end |
#find_similar_words(word, k: 10) ⇒ Array<Hash>
Find semantically similar words.
132 133 134 135 136 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 132 def find_similar_words(word, k: 10) return [] unless @search @search.find_nearest(word, k: k, exclude_self: false) end |
#generate(context) ⇒ SuggestionSet
Generate suggestions using semantic similarity.
Handles two cases:
-
Word not in vocabulary (typo): Re-ranks edit-distance candidates
-
Word in vocabulary (real-word error): Finds semantically similar alternatives
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 73 def generate(context) word = context.word max_results = context.max_results || max_results # Ensure embeddings are loaded return SuggestionSet.empty unless @search # Case 1: Word not in vocabulary (typo) unless @vocabulary.include?(word) return generate_for_typo(context) end # Case 2: Real-word error detection # Find semantically similar words that might be correct in context generate_for_real_word_error(context) end |
#handles?(context) ⇒ Boolean
Check if this strategy should handle the context.
Semantic strategy handles:
-
Words not in vocabulary (for typo re-ranking)
-
Words in vocabulary (for real-word error detection)
98 99 100 101 102 103 104 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 98 def handles?(context) return false unless enabled? return false unless @search && @vocabulary # Handle all words - we filter in generate() true end |
#semantic_similarity(word1, word2) ⇒ Float?
Compute semantic similarity between two words.
121 122 123 124 125 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 121 def semantic_similarity(word1, word2) return nil unless @search @search.similarity(word1, word2) end |
#to_s ⇒ String Also known as: inspect
String representation.
141 142 143 |
# File 'lib/kotoshu/suggestions/strategies/semantic_strategy.rb', line 141 def to_s "SemanticStrategy(language: #{@language_code}, vocab_size: #{@vocabulary&.size || 0}, loaded: #{@search && true})" end |