Class: Kotoshu::Dictionary::Hunspell

Inherits:
Base
  • Object
show all
Defined in:
lib/kotoshu/dictionary/hunspell.rb

Overview

Hunspell dictionary backend.

This dictionary reads Hunspell-formatted dictionary files (.dic and .aff). Hunspell is the spell checker used by LibreOffice, Firefox, Chrome, and many other applications.

File format:

  • .dic: Dictionary file with word count on first line, words with optional flags

  • .aff: Affix file with prefix/suffix rules and configuration

Examples:

Creating a Hunspell dictionary

dict = Hunspell.new(
  dic_path: "en_US.dic",
  aff_path: "en_US.aff",
  language_code: "en-US"
)
dict.lookup?("hello")  # => true

Creating from GitHub cache

dict = Hunspell.from_github("de")
dict.lookup?("über")  # => true

See Also:

Instance Attribute Summary collapse

Attributes inherited from Base

#language_code, #locale, #metadata

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#each_word, #empty?, load, #lookup?, register_type, registry, #size, #to_s, #type, #words_matching, #words_with_prefix

Constructor Details

#initialize(dic_path:, aff_path:, language_code:, locale: nil, metadata: {}) ⇒ Hunspell

Create a new Hunspell dictionary.

Parameters:

  • dic_path (String)

    Path or URL to the .dic file

  • aff_path (String)

    Path or URL to the .aff file

  • language_code (String)

    The language code

  • locale (String, nil) (defaults to: nil)

    The locale (optional)

  • metadata (Hash) (defaults to: {})

    Additional metadata (optional)

Raises:



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# File 'lib/kotoshu/dictionary/hunspell.rb', line 144

def initialize(dic_path:, aff_path:, language_code:, locale: nil, metadata: {})
  super(language_code, locale: locale, metadata: )

  @dic_path = resolve_path(dic_path)
  @aff_path = resolve_path(aff_path)

  raise DictionaryNotFoundError, @aff_path unless File.exist?(@aff_path)
  raise DictionaryNotFoundError, @dic_path unless File.exist?(@dic_path)

  # Read aff file using AffReader and cache the data
  aff_reader = Readers::AffReader.new(@aff_path)
  @aff_data = aff_reader.read
  @aff_config = @aff_data  # For backward compatibility

  # Read dic file using DicReader with the same encoding as the aff file
  dic_reader = Readers::DicReader.new(@dic_path,
                                       encoding: aff_reader.encoding,
                                       flag_format: @aff_data['FLAG'] || 'short',
                                       flag_synonyms: @aff_data['AF'] || {})
  @dic_words = dic_reader.read

  # Build legacy structures for backward compatibility
  @word_index = build_word_index(@dic_words)
  @affix_rules = parse_affix_rules(@aff_config)

  # Lazy initialization of Lookuper (only created when needed)
  @lookuper = nil

  # Register this dictionary type
  self.class.register_type(:hunspell) unless Dictionary.registry.key?(:hunspell)
end

Instance Attribute Details

#aff_configHash (readonly)

Returns Configuration options from affix file.

Returns:

  • (Hash)

    Configuration options from affix file



45
46
47
# File 'lib/kotoshu/dictionary/hunspell.rb', line 45

def aff_config
  @aff_config
end

#aff_dataHash (readonly)

Returns Raw aff data from AffReader (cached for Lookuper).

Returns:

  • (Hash)

    Raw aff data from AffReader (cached for Lookuper)



48
49
50
# File 'lib/kotoshu/dictionary/hunspell.rb', line 48

def aff_data
  @aff_data
end

#aff_pathString (readonly)

Returns Path to the .aff file.

Returns:

  • (String)

    Path to the .aff file



39
40
41
# File 'lib/kotoshu/dictionary/hunspell.rb', line 39

def aff_path
  @aff_path
end

#affix_rulesHash (readonly)

Returns Affix rules (flag => array of rules).

Returns:

  • (Hash)

    Affix rules (flag => array of rules)



42
43
44
# File 'lib/kotoshu/dictionary/hunspell.rb', line 42

def affix_rules
  @affix_rules
end

#dic_pathString (readonly)

Returns Path to the .dic file.

Returns:

  • (String)

    Path to the .dic file



36
37
38
# File 'lib/kotoshu/dictionary/hunspell.rb', line 36

def dic_path
  @dic_path
end

#dic_wordsArray (readonly)

Returns Raw words from DicReader (cached for Lookuper).

Returns:

  • (Array)

    Raw words from DicReader (cached for Lookuper)



51
52
53
# File 'lib/kotoshu/dictionary/hunspell.rb', line 51

def dic_words
  @dic_words
end

Class Method Details

.available_github_languages(cache: nil) ⇒ Array<String>

Get list of available languages on GitHub.

Parameters:

Returns:

  • (Array<String>)

    List of supported language codes



117
118
119
120
121
122
# File 'lib/kotoshu/dictionary/hunspell.rb', line 117

def available_github_languages(cache: nil)
  require_relative '../cache/language_cache'

  cache ||= Cache::LanguageCache.new
  cache.available_languages
end

.available_on_github?(language_code, cache: nil) ⇒ Boolean

Check if a language is available on GitHub.

Parameters:

  • language_code (String)

    ISO 639-1 language code

  • cache (Cache::LanguageCache, nil) (defaults to: nil)

    Custom cache instance (optional)

Returns:

  • (Boolean)

    True if language is supported



106
107
108
109
110
111
# File 'lib/kotoshu/dictionary/hunspell.rb', line 106

def available_on_github?(language_code, cache: nil)
  require_relative '../cache/language_cache'

  cache ||= Cache::LanguageCache.new
  cache.available_languages.include?(language_code)
end

.from_github(language_code, cache: nil, force_download: false) ⇒ Hunspell

Load Hunspell dictionary from GitHub cache, downloading if necessary.

This class method provides automatic dictionary management by:

  1. Checking the local cache for existing dictionaries

  2. Downloading from GitHub if not cached or expired

  3. Managing cache metadata and TTL

Examples:

Load English dictionary

dict = Hunspell.from_github("en")
dict.lookup?("hello")  # => true

Load German dictionary

dict = Hunspell.from_github("de")
dict.lookup?("über")  # => true

Force re-download

dict = Hunspell.from_github("fr", force_download: true)

Parameters:

  • language_code (String)

    ISO 639-1 language code (e.g., ‘en’, ‘de’, ‘fr’)

  • cache (Cache::LanguageCache, nil) (defaults to: nil)

    Custom cache instance (optional)

  • force_download (Boolean) (defaults to: false)

    Force re-download even if cached

Returns:

  • (Hunspell)

    Configured Hunspell dictionary instance

Raises:

  • (ArgumentError)

    If language_code is not supported



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/kotoshu/dictionary/hunspell.rb', line 82

def from_github(language_code, cache: nil, force_download: false)
  require_relative '../cache/language_cache'

  cache ||= Cache::LanguageCache.new
  cached = cache.get_dictionary(language_code, force_download: force_download)

  new(
    dic_path: cached[:dic_path],
    aff_path: cached[:aff_path],
    language_code: language_code,
    metadata: {
      source: 'github',
      github_url: cached[:metadata]['url'],
      checksum: cached[:metadata]['checksum'],
      downloaded_at: cached[:metadata]['downloaded_at']
    }
  )
end

.language_info(language_code, cache: nil) ⇒ Hash

Get information about a language from GitHub.

Parameters:

  • language_code (String)

    ISO 639-1 language code

  • cache (Cache::LanguageCache, nil) (defaults to: nil)

    Custom cache instance (optional)

Returns:

  • (Hash)

    Language information



129
130
131
132
133
134
# File 'lib/kotoshu/dictionary/hunspell.rb', line 129

def language_info(language_code, cache: nil)
  require_relative '../cache/language_cache'

  cache ||= Cache::LanguageCache.new
  cache.get_language_info(language_code)
end

Instance Method Details

#add_word(word, flags: []) ⇒ Boolean

Add a word to the dictionary.

Parameters:

  • word (String)

    The word to add

  • flags (Array<String>) (defaults to: [])

    Morphological flags

Returns:

  • (Boolean)

    True if added



320
321
322
323
324
325
326
327
# File 'lib/kotoshu/dictionary/hunspell.rb', line 320

def add_word(word, flags: [])
  return false if word.nil? || word.empty?

  word_key = word.downcase
  @word_index[word_key] = flags

  true
end

#lookup(word) ⇒ Boolean

Check if a word exists in the dictionary.

Uses the Lookup::Lookuper algorithm for full affix and compound support.

Parameters:

  • word (String)

    The word to look up

Returns:

  • (Boolean)

    True if the word exists



282
283
284
285
286
287
# File 'lib/kotoshu/dictionary/hunspell.rb', line 282

def lookup(word)
  return false if word.nil? || word.empty?

  # Use the Lookuper for full Hunspell algorithm support
  lookuper.call(word)
end

#lookuperAlgorithms::Lookup::Lookuper

Returns The lookup algorithm instance.

Returns:



54
55
56
# File 'lib/kotoshu/dictionary/hunspell.rb', line 54

def lookuper
  @lookuper ||= Readers::LookupBuilder.from_data(@aff_data, @dic_words).build
end

#remove_word(word) ⇒ Boolean

Remove a word from the dictionary.

Parameters:

  • word (String)

    The word to remove

Returns:

  • (Boolean)

    True if removed



333
334
335
336
337
338
# File 'lib/kotoshu/dictionary/hunspell.rb', line 333

def remove_word(word)
  return false if word.nil? || word.empty?

  word_key = word.downcase
  !@word_index.delete(word_key).nil?
end

#suggest(word, max_suggestions: 10) ⇒ Array<String>

Generate spelling suggestions.

Parameters:

  • word (String)

    The misspelled word

  • max_suggestions (Integer) (defaults to: 10)

    Maximum suggestions

Returns:

  • (Array<String>)

    List of suggested words



294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
# File 'lib/kotoshu/dictionary/hunspell.rb', line 294

def suggest(word, max_suggestions: 10)
  return [] if word.nil? || word.empty?

  all_words = @word_index.keys + generate_affix_variants
  lookup_word = word.downcase

  # Find words with same prefix
  prefix_len = [lookup_word.length - 1, 2].max
  prefix = lookup_word[0...prefix_len]
  candidates = all_words.select { |w| w.downcase.start_with?(prefix) }

  # Calculate edit distances
  candidates.map do |dict_word|
    dist = edit_distance(lookup_word, dict_word.downcase)
    [dict_word, dist]
  end.select { |_, dist| dist.positive? && dist <= 2 }
            .sort_by { |_, dist| dist }
            .first(max_suggestions)
            .map(&:first)
end

#word_variants(word) ⇒ Array<String>

Get word variants using affix rules.

Parameters:

  • word (String)

    The word

Returns:

  • (Array<String>)

    Word variants



351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
# File 'lib/kotoshu/dictionary/hunspell.rb', line 351

def word_variants(word)
  return [] if word.nil? || word.empty?

  variants = []

  # Get flags for this word (if any)
  word_key = word.downcase
  flags = @word_index[word_key] || []

  # Generate prefix variants
  @affix_rules[:prefix].each do |flag, rules|
    next unless flags.include?(flag)

    rules.each do |rule|
      variant = rule.apply(word)
      variants << variant if variant
    end
  end

  # Generate suffix variants
  @affix_rules[:suffix].each do |flag, rules|
    next unless flags.include?(flag)

    rules.each do |rule|
      variant = rule.apply(word)
      variants << variant if variant
    end
  end

  variants
end

#wordsArray<String>

Get all words in the dictionary.

Returns:

  • (Array<String>)

    All words



343
344
345
# File 'lib/kotoshu/dictionary/hunspell.rb', line 343

def words
  @word_index.keys.dup
end