Class: Kotoshu::Dictionary::CSpell

Inherits:
Base
  • Object
show all
Defined in:
lib/kotoshu/dictionary/cspell.rb

Overview

CSpell dictionary backend.

This dictionary reads CSpell-formatted dictionary files (plain text .txt or compressed .trie files). CSpell is the spell checker used by VS Code.

File format:

  • .txt: Plain text with one word per line, # comments supported

  • .trie: Compressed trie format (DAFSA - Deterministic Acyclic Finite State Automaton)

Examples:

Creating from a text file

dict = CSpell.new("words.txt", language_code: "en-US")
dict.lookup?("hello")  # => true

Creating from a trie file

dict = CSpell.new("words.trie", language_code: "en")

Instance Attribute Summary collapse

Attributes inherited from Base

#language_code, #locale, #metadata

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#each_word, #empty?, load, #lookup?, register_type, registry, #size, #to_s, #type, #words_matching

Constructor Details

#initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {}) ⇒ CSpell

Create a new CSpell dictionary.

Parameters:

  • path (String)

    Path to the dictionary file (.txt or .trie)

  • language_code (String)

    The language code

  • locale (String, nil) (defaults to: nil)

    The locale (optional)

  • case_sensitive (Boolean) (defaults to: false)

    Whether lookups are case-sensitive

  • metadata (Hash) (defaults to: {})

    Additional metadata (optional)

Raises:



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/kotoshu/dictionary/cspell.rb', line 39

def initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {})
  super(language_code, locale: locale, metadata: )

  @path = File.expand_path(path)
  @case_sensitive = case_sensitive

  raise DictionaryNotFoundError, @path unless File.exist?(@path)

  # Load based on file extension
  @trie = if @path.end_with?(".trie")
            load_trie_file(@path)
          else
            load_text_file(@path)
          end

  # Register this dictionary type
  self.class.register_type(:cspell) unless Dictionary.registry.key?(:cspell)
end

Instance Attribute Details

#case_sensitiveBoolean (readonly)

Returns Whether lookups are case-sensitive.

Returns:

  • (Boolean)

    Whether lookups are case-sensitive



27
28
29
# File 'lib/kotoshu/dictionary/cspell.rb', line 27

def case_sensitive
  @case_sensitive
end

#pathString (readonly)

Returns The path to the dictionary file.

Returns:

  • (String)

    The path to the dictionary file



24
25
26
# File 'lib/kotoshu/dictionary/cspell.rb', line 24

def path
  @path
end

#trieCore::Trie::Trie (readonly)

Returns The trie data structure.

Returns:



30
31
32
# File 'lib/kotoshu/dictionary/cspell.rb', line 30

def trie
  @trie
end

Class Method Details

.from_words(words, language_code:, locale: nil, case_sensitive: false) ⇒ CSpell

Create a dictionary from an array of words.

Examples:

dict = CSpell.from_words(%w[hello world test], language_code: "en")

Parameters:

  • words (Array<String>)

    The words

  • language_code (String)

    The language code

  • locale (String, nil) (defaults to: nil)

    The locale (optional)

  • case_sensitive (Boolean) (defaults to: false)

    Whether lookups are case-sensitive

Returns:



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/kotoshu/dictionary/cspell.rb', line 172

def self.from_words(words, language_code:, locale: nil, case_sensitive: false)
  dict = allocate

  # Build trie from words
  normalized_words = words.map { |w| case_sensitive ? w : w.downcase }.uniq
  trie = Core::Trie::Builder.from_array(normalized_words)

  dict.instance_variable_set(:@language_code, language_code.dup.freeze)
  dict.instance_variable_set(:@locale, locale&.dup&.freeze)
  dict.instance_variable_set(:@path, nil)
  dict.instance_variable_set(:@case_sensitive, case_sensitive)
  dict.instance_variable_set(:@trie, trie)
  dict.instance_variable_set(:@metadata, {}.freeze)

  # Register this dictionary type (unless already registered)
  register_type(:cspell) unless Dictionary.registry.key?(:cspell)

  dict
end

Instance Method Details

#add_word(word, flags: []) ⇒ Boolean

Add a word to the dictionary.

Parameters:

  • word (String)

    The word to add

  • flags (Array<String>) (defaults to: [])

    Flags (ignored for CSpell)

Returns:

  • (Boolean)

    True if added



123
124
125
126
127
128
129
130
131
# File 'lib/kotoshu/dictionary/cspell.rb', line 123

def add_word(word, flags: [])
  return false if word.nil? || word.empty?

  lookup_word = @case_sensitive ? word : word.downcase
  return false if @trie.lookup(lookup_word)

  @trie.insert(lookup_word)
  true
end

#has_prefix?(prefix) ⇒ Boolean

Check if the dictionary has words with a prefix.

Parameters:

  • prefix (String)

    The prefix

Returns:

  • (Boolean)

    True if words exist with the prefix



73
74
75
76
77
78
# File 'lib/kotoshu/dictionary/cspell.rb', line 73

def has_prefix?(prefix)
  return false if prefix.nil? || prefix.empty?

  lookup_prefix = @case_sensitive ? prefix : prefix.downcase
  @trie.has_prefix?(lookup_prefix)
end

#lookup(word) ⇒ Boolean

Check if a word exists in the dictionary.

Parameters:

  • word (String)

    The word to look up

Returns:

  • (Boolean)

    True if the word exists



62
63
64
65
66
67
# File 'lib/kotoshu/dictionary/cspell.rb', line 62

def lookup(word)
  return false if word.nil? || word.empty?

  lookup_word = @case_sensitive ? word : word.downcase
  @trie.lookup(lookup_word)
end

#remove_word(_word) ⇒ Boolean

Note:

CSpell dictionaries are typically immutable after loading

Remove a word from the dictionary.

Parameters:

  • word (String)

    The word to remove

Returns:

  • (Boolean)

    True if removed



138
139
140
141
142
# File 'lib/kotoshu/dictionary/cspell.rb', line 138

def remove_word(_word)
  # Trie doesn't support removal easily
  # Would need to rebuild the trie
  false
end

#suggest(word, max_suggestions: 10) ⇒ Array<String>

Generate spelling suggestions.

Uses trie walk to find similar words.

Parameters:

  • word (String)

    The misspelled word

  • max_suggestions (Integer) (defaults to: 10)

    Maximum suggestions

Returns:

  • (Array<String>)

    List of suggested words



87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/kotoshu/dictionary/cspell.rb', line 87

def suggest(word, max_suggestions: 10)
  return [] if word.nil? || word.empty?

  lookup_word = @case_sensitive ? word : word.downcase

  # First try prefix-based suggestions
  prefix_suggestions = @trie.suggestions(lookup_word, max_results: max_suggestions)

  # If we have enough prefix suggestions, return them
  return prefix_suggestions if prefix_suggestions.length >= max_suggestions

  # Otherwise, use edit distance for more suggestions
  all_words = @trie.all_words
  candidates = all_words.select do |w|
    w.length >= lookup_word.length - 2 &&
      w.length <= lookup_word.length + 2
  end

  # Calculate edit distances
  results = candidates.map do |dict_word|
    dist = edit_distance(lookup_word, dict_word)
    [dict_word, dist]
  end.select { |_, dist| dist.positive? && dist <= 2 }
                      .sort_by { |_, dist| dist }
                      .first(max_suggestions - prefix_suggestions.length)
                      .map(&:first)

  # Combine both sets
  (prefix_suggestions + results).uniq.first(max_suggestions)
end

#wordsArray<String>

Get all words in the dictionary.

Returns:

  • (Array<String>)

    All words



147
148
149
# File 'lib/kotoshu/dictionary/cspell.rb', line 147

def words
  @trie.all_words
end

#words_with_prefix(prefix) ⇒ Array<String>

Get words with a prefix.

Parameters:

  • prefix (String)

    The prefix

Returns:

  • (Array<String>)

    Words with the prefix



155
156
157
158
159
160
# File 'lib/kotoshu/dictionary/cspell.rb', line 155

def words_with_prefix(prefix)
  return [] if prefix.nil? || prefix.empty?

  lookup_prefix = @case_sensitive ? prefix : prefix.downcase
  @trie.words_with_prefix(lookup_prefix)
end