Class: Kotoshu::Dictionary::CSpell
- Defined in:
- lib/kotoshu/dictionary/cspell.rb
Overview
CSpell dictionary backend.
This dictionary reads CSpell-formatted dictionary files (plain text .txt or compressed .trie files). CSpell is the spell checker used by VS Code.
File format:
-
.txt: Plain text with one word per line, # comments supported
-
.trie: Compressed trie format (DAFSA - Deterministic Acyclic Finite State Automaton)
Instance Attribute Summary collapse
-
#case_sensitive ⇒ Boolean
readonly
Whether lookups are case-sensitive.
-
#path ⇒ String
readonly
The path to the dictionary file.
-
#trie ⇒ Core::Trie::Trie
readonly
The trie data structure.
Attributes inherited from Base
#language_code, #locale, #metadata
Class Method Summary collapse
-
.from_words(words, language_code:, locale: nil, case_sensitive: false) ⇒ CSpell
Create a dictionary from an array of words.
Instance Method Summary collapse
-
#add_word(word, flags: []) ⇒ Boolean
Add a word to the dictionary.
-
#has_prefix?(prefix) ⇒ Boolean
Check if the dictionary has words with a prefix.
-
#initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {}) ⇒ CSpell
constructor
Create a new CSpell dictionary.
-
#lookup(word) ⇒ Boolean
Check if a word exists in the dictionary.
-
#remove_word(_word) ⇒ Boolean
Remove a word from the dictionary.
-
#suggest(word, max_suggestions: 10) ⇒ Array<String>
Generate spelling suggestions.
-
#words ⇒ Array<String>
Get all words in the dictionary.
-
#words_with_prefix(prefix) ⇒ Array<String>
Get words with a prefix.
Methods inherited from Base
#each_word, #empty?, load, #lookup?, register_type, registry, #size, #to_s, #type, #words_matching
Constructor Details
#initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {}) ⇒ CSpell
Create a new CSpell dictionary.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 39 def initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {}) super(language_code, locale: locale, metadata: ) @path = File.(path) @case_sensitive = case_sensitive raise DictionaryNotFoundError, @path unless File.exist?(@path) # Load based on file extension @trie = if @path.end_with?(".trie") load_trie_file(@path) else load_text_file(@path) end # Register this dictionary type self.class.register_type(:cspell) unless Dictionary.registry.key?(:cspell) end |
Instance Attribute Details
#case_sensitive ⇒ Boolean (readonly)
Returns Whether lookups are case-sensitive.
27 28 29 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 27 def case_sensitive @case_sensitive end |
#path ⇒ String (readonly)
Returns The path to the dictionary file.
24 25 26 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 24 def path @path end |
#trie ⇒ Core::Trie::Trie (readonly)
Returns The trie data structure.
30 31 32 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 30 def trie @trie end |
Class Method Details
.from_words(words, language_code:, locale: nil, case_sensitive: false) ⇒ CSpell
Create a dictionary from an array of words.
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 172 def self.from_words(words, language_code:, locale: nil, case_sensitive: false) dict = allocate # Build trie from words normalized_words = words.map { |w| case_sensitive ? w : w.downcase }.uniq trie = Core::Trie::Builder.from_array(normalized_words) dict.instance_variable_set(:@language_code, language_code.dup.freeze) dict.instance_variable_set(:@locale, locale&.dup&.freeze) dict.instance_variable_set(:@path, nil) dict.instance_variable_set(:@case_sensitive, case_sensitive) dict.instance_variable_set(:@trie, trie) dict.instance_variable_set(:@metadata, {}.freeze) # Register this dictionary type (unless already registered) register_type(:cspell) unless Dictionary.registry.key?(:cspell) dict end |
Instance Method Details
#add_word(word, flags: []) ⇒ Boolean
Add a word to the dictionary.
123 124 125 126 127 128 129 130 131 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 123 def add_word(word, flags: []) return false if word.nil? || word.empty? lookup_word = @case_sensitive ? word : word.downcase return false if @trie.lookup(lookup_word) @trie.insert(lookup_word) true end |
#has_prefix?(prefix) ⇒ Boolean
Check if the dictionary has words with a prefix.
73 74 75 76 77 78 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 73 def has_prefix?(prefix) return false if prefix.nil? || prefix.empty? lookup_prefix = @case_sensitive ? prefix : prefix.downcase @trie.has_prefix?(lookup_prefix) end |
#lookup(word) ⇒ Boolean
Check if a word exists in the dictionary.
62 63 64 65 66 67 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 62 def lookup(word) return false if word.nil? || word.empty? lookup_word = @case_sensitive ? word : word.downcase @trie.lookup(lookup_word) end |
#remove_word(_word) ⇒ Boolean
CSpell dictionaries are typically immutable after loading
Remove a word from the dictionary.
138 139 140 141 142 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 138 def remove_word(_word) # Trie doesn't support removal easily # Would need to rebuild the trie false end |
#suggest(word, max_suggestions: 10) ⇒ Array<String>
Generate spelling suggestions.
Uses trie walk to find similar words.
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 87 def suggest(word, max_suggestions: 10) return [] if word.nil? || word.empty? lookup_word = @case_sensitive ? word : word.downcase # First try prefix-based suggestions prefix_suggestions = @trie.suggestions(lookup_word, max_results: max_suggestions) # If we have enough prefix suggestions, return them return prefix_suggestions if prefix_suggestions.length >= max_suggestions # Otherwise, use edit distance for more suggestions all_words = @trie.all_words candidates = all_words.select do |w| w.length >= lookup_word.length - 2 && w.length <= lookup_word.length + 2 end # Calculate edit distances results = candidates.map do |dict_word| dist = edit_distance(lookup_word, dict_word) [dict_word, dist] end.select { |_, dist| dist.positive? && dist <= 2 } .sort_by { |_, dist| dist } .first(max_suggestions - prefix_suggestions.length) .map(&:first) # Combine both sets (prefix_suggestions + results).uniq.first(max_suggestions) end |
#words ⇒ Array<String>
Get all words in the dictionary.
147 148 149 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 147 def words @trie.all_words end |
#words_with_prefix(prefix) ⇒ Array<String>
Get words with a prefix.
155 156 157 158 159 160 |
# File 'lib/kotoshu/dictionary/cspell.rb', line 155 def words_with_prefix(prefix) return [] if prefix.nil? || prefix.empty? lookup_prefix = @case_sensitive ? prefix : prefix.downcase @trie.words_with_prefix(lookup_prefix) end |