Class: Kotoshu::Dictionary::Hunspell
- Defined in:
- lib/kotoshu/dictionary/hunspell.rb
Overview
Hunspell dictionary backend.
This dictionary reads Hunspell-formatted dictionary files (.dic and .aff). Hunspell is the spell checker used by LibreOffice, Firefox, Chrome, and many other applications.
File format:
-
.dic: Dictionary file with word count on first line, words with optional flags
-
.aff: Affix file with prefix/suffix rules and configuration
Instance Attribute Summary collapse
-
#aff_config ⇒ Hash
readonly
Configuration options from affix file.
-
#aff_data ⇒ Hash
readonly
Raw aff data from AffReader (cached for Lookuper).
-
#aff_path ⇒ String
readonly
Path to the .aff file.
-
#affix_rules ⇒ Hash
readonly
Affix rules (flag => array of rules).
-
#dic_path ⇒ String
readonly
Path to the .dic file.
-
#dic_words ⇒ Array
readonly
Raw words from DicReader (cached for Lookuper).
Attributes inherited from Base
#language_code, #locale, #metadata
Class Method Summary collapse
-
.available_github_languages(cache: nil) ⇒ Array<String>
Get list of available languages on GitHub.
-
.available_on_github?(language_code, cache: nil) ⇒ Boolean
Check if a language is available on GitHub.
-
.from_github(language_code, cache: nil, force_download: false) ⇒ Hunspell
Load Hunspell dictionary from GitHub cache, downloading if necessary.
-
.language_info(language_code, cache: nil) ⇒ Hash
Get information about a language from GitHub.
Instance Method Summary collapse
-
#add_word(word, flags: []) ⇒ Boolean
Add a word to the dictionary.
-
#initialize(dic_path:, aff_path:, language_code:, locale: nil, metadata: {}) ⇒ Hunspell
constructor
Create a new Hunspell dictionary.
-
#lookup(word) ⇒ Boolean
Check if a word exists in the dictionary.
-
#lookuper ⇒ Algorithms::Lookup::Lookuper
The lookup algorithm instance.
-
#remove_word(word) ⇒ Boolean
Remove a word from the dictionary.
-
#suggest(word, max_suggestions: 10) ⇒ Array<String>
Generate spelling suggestions.
-
#word_variants(word) ⇒ Array<String>
Get word variants using affix rules.
-
#words ⇒ Array<String>
Get all words in the dictionary.
Methods inherited from Base
#each_word, #empty?, load, #lookup?, register_type, registry, #size, #to_s, #type, #words_matching, #words_with_prefix
Constructor Details
#initialize(dic_path:, aff_path:, language_code:, locale: nil, metadata: {}) ⇒ Hunspell
Create a new Hunspell dictionary.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 144 def initialize(dic_path:, aff_path:, language_code:, locale: nil, metadata: {}) super(language_code, locale: locale, metadata: ) @dic_path = resolve_path(dic_path) @aff_path = resolve_path(aff_path) raise DictionaryNotFoundError, @aff_path unless File.exist?(@aff_path) raise DictionaryNotFoundError, @dic_path unless File.exist?(@dic_path) # Read aff file using AffReader and cache the data aff_reader = Readers::AffReader.new(@aff_path) @aff_data = aff_reader.read @aff_config = @aff_data # For backward compatibility # Read dic file using DicReader with the same encoding as the aff file dic_reader = Readers::DicReader.new(@dic_path, encoding: aff_reader.encoding, flag_format: @aff_data['FLAG'] || 'short', flag_synonyms: @aff_data['AF'] || {}) @dic_words = dic_reader.read # Build legacy structures for backward compatibility @word_index = build_word_index(@dic_words) @affix_rules = parse_affix_rules(@aff_config) # Lazy initialization of Lookuper (only created when needed) @lookuper = nil # Register this dictionary type self.class.register_type(:hunspell) unless Dictionary.registry.key?(:hunspell) end |
Instance Attribute Details
#aff_config ⇒ Hash (readonly)
Returns Configuration options from affix file.
45 46 47 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 45 def aff_config @aff_config end |
#aff_data ⇒ Hash (readonly)
Returns Raw aff data from AffReader (cached for Lookuper).
48 49 50 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 48 def aff_data @aff_data end |
#aff_path ⇒ String (readonly)
Returns Path to the .aff file.
39 40 41 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 39 def aff_path @aff_path end |
#affix_rules ⇒ Hash (readonly)
Returns Affix rules (flag => array of rules).
42 43 44 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 42 def affix_rules @affix_rules end |
#dic_path ⇒ String (readonly)
Returns Path to the .dic file.
36 37 38 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 36 def dic_path @dic_path end |
#dic_words ⇒ Array (readonly)
Returns Raw words from DicReader (cached for Lookuper).
51 52 53 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 51 def dic_words @dic_words end |
Class Method Details
.available_github_languages(cache: nil) ⇒ Array<String>
Get list of available languages on GitHub.
117 118 119 120 121 122 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 117 def available_github_languages(cache: nil) require_relative '../cache/language_cache' cache ||= Cache::LanguageCache.new cache.available_languages end |
.available_on_github?(language_code, cache: nil) ⇒ Boolean
Check if a language is available on GitHub.
106 107 108 109 110 111 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 106 def available_on_github?(language_code, cache: nil) require_relative '../cache/language_cache' cache ||= Cache::LanguageCache.new cache.available_languages.include?(language_code) end |
.from_github(language_code, cache: nil, force_download: false) ⇒ Hunspell
Load Hunspell dictionary from GitHub cache, downloading if necessary.
This class method provides automatic dictionary management by:
-
Checking the local cache for existing dictionaries
-
Downloading from GitHub if not cached or expired
-
Managing cache metadata and TTL
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 82 def from_github(language_code, cache: nil, force_download: false) require_relative '../cache/language_cache' cache ||= Cache::LanguageCache.new cached = cache.get_dictionary(language_code, force_download: force_download) new( dic_path: cached[:dic_path], aff_path: cached[:aff_path], language_code: language_code, metadata: { source: 'github', github_url: cached[:metadata]['url'], checksum: cached[:metadata]['checksum'], downloaded_at: cached[:metadata]['downloaded_at'] } ) end |
.language_info(language_code, cache: nil) ⇒ Hash
Get information about a language from GitHub.
129 130 131 132 133 134 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 129 def language_info(language_code, cache: nil) require_relative '../cache/language_cache' cache ||= Cache::LanguageCache.new cache.get_language_info(language_code) end |
Instance Method Details
#add_word(word, flags: []) ⇒ Boolean
Add a word to the dictionary.
320 321 322 323 324 325 326 327 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 320 def add_word(word, flags: []) return false if word.nil? || word.empty? word_key = word.downcase @word_index[word_key] = flags true end |
#lookup(word) ⇒ Boolean
Check if a word exists in the dictionary.
Uses the Lookup::Lookuper algorithm for full affix and compound support.
282 283 284 285 286 287 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 282 def lookup(word) return false if word.nil? || word.empty? # Use the Lookuper for full Hunspell algorithm support lookuper.call(word) end |
#lookuper ⇒ Algorithms::Lookup::Lookuper
Returns The lookup algorithm instance.
54 55 56 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 54 def lookuper @lookuper ||= Readers::LookupBuilder.from_data(@aff_data, @dic_words).build end |
#remove_word(word) ⇒ Boolean
Remove a word from the dictionary.
333 334 335 336 337 338 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 333 def remove_word(word) return false if word.nil? || word.empty? word_key = word.downcase !@word_index.delete(word_key).nil? end |
#suggest(word, max_suggestions: 10) ⇒ Array<String>
Generate spelling suggestions.
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 294 def suggest(word, max_suggestions: 10) return [] if word.nil? || word.empty? all_words = @word_index.keys + generate_affix_variants lookup_word = word.downcase # Find words with same prefix prefix_len = [lookup_word.length - 1, 2].max prefix = lookup_word[0...prefix_len] candidates = all_words.select { |w| w.downcase.start_with?(prefix) } # Calculate edit distances candidates.map do |dict_word| dist = edit_distance(lookup_word, dict_word.downcase) [dict_word, dist] end.select { |_, dist| dist.positive? && dist <= 2 } .sort_by { |_, dist| dist } .first(max_suggestions) .map(&:first) end |
#word_variants(word) ⇒ Array<String>
Get word variants using affix rules.
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 351 def word_variants(word) return [] if word.nil? || word.empty? variants = [] # Get flags for this word (if any) word_key = word.downcase flags = @word_index[word_key] || [] # Generate prefix variants @affix_rules[:prefix].each do |flag, rules| next unless flags.include?(flag) rules.each do |rule| variant = rule.apply(word) variants << variant if variant end end # Generate suffix variants @affix_rules[:suffix].each do |flag, rules| next unless flags.include?(flag) rules.each do |rule| variant = rule.apply(word) variants << variant if variant end end variants end |
#words ⇒ Array<String>
Get all words in the dictionary.
343 344 345 |
# File 'lib/kotoshu/dictionary/hunspell.rb', line 343 def words @word_index.keys.dup end |