Class: Kotoshu::Dictionary::UnixWords

Inherits:
Base
  • Object
show all
Defined in:
lib/kotoshu/dictionary/unix_words.rb

Overview

Unix system dictionary backend.

This dictionary reads from Unix-style system dictionary files, typically located at ‘/usr/share/dict/words` or symlinks to dictionaries like `web2` (Webster’s Second International).

Examples:

Using system dictionary

dict = UnixWords.new("/usr/share/dict/words", language_code: "en-US")
dict.lookup?("hello")     # => true
dict.suggest("helo")      # => ["hello", "help", "held", ...]

Auto-detecting system dictionary

dict = UnixWords.detect(language_code: "en-US")

Constant Summary collapse

SYSTEM_PATHS =

Standard system paths to check for dictionaries.

[
  "/usr/share/dict/words",
  "/usr/share/dict/web2",
  "/usr/share/dict/american-english",
  "/usr/share/dict/british-english",
  "/usr/dict/words",
  "/System/Library/Assets/com_apple_MobileAsset_DictionaryServices_dictionaryOS/Dictionary/words" # macOS
].freeze

Instance Attribute Summary collapse

Attributes inherited from Base

#language_code, #locale, #metadata

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#each_word, #empty?, load, #lookup?, register_type, registry, #size, #to_s, #type, #words_matching, #words_with_prefix

Constructor Details

#initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {}) ⇒ UnixWords

Create a new UnixWords dictionary.

Parameters:

  • path (String)

    Path to the dictionary file

  • language_code (String)

    The language code

  • locale (String, nil) (defaults to: nil)

    The locale (optional)

  • case_sensitive (Boolean) (defaults to: false)

    Whether lookups are case-sensitive

  • metadata (Hash) (defaults to: {})

    Additional metadata (optional)



44
45
46
47
48
49
50
51
52
53
54
# File 'lib/kotoshu/dictionary/unix_words.rb', line 44

def initialize(path, language_code:, locale: nil, case_sensitive: false, metadata: {})
  super(language_code, locale: locale, metadata: )

  @path = File.expand_path(path)
  @case_sensitive = case_sensitive
  @words = load_words(@path)
  @word_set = build_word_set

  # Register this dictionary type
  self.class.register_type(:unix_words) unless Dictionary.registry.key?(:unix_words)
end

Instance Attribute Details

#case_sensitiveBoolean (readonly)

Returns Whether lookups are case-sensitive.

Returns:

  • (Boolean)

    Whether lookups are case-sensitive



35
36
37
# File 'lib/kotoshu/dictionary/unix_words.rb', line 35

def case_sensitive
  @case_sensitive
end

#pathString (readonly)

Returns The path to the dictionary file.

Returns:

  • (String)

    The path to the dictionary file



32
33
34
# File 'lib/kotoshu/dictionary/unix_words.rb', line 32

def path
  @path
end

Class Method Details

.detect(language_code:, locale: nil, case_sensitive: false) ⇒ UnixWords?

Create a dictionary by auto-detecting system dictionary.

Examples:

dict = UnixWords.detect(language_code: "en-US")

Parameters:

  • language_code (String)

    The language code

  • locale (String, nil) (defaults to: nil)

    The locale (optional)

  • case_sensitive (Boolean) (defaults to: false)

    Whether lookups are case-sensitive

Returns:

  • (UnixWords, nil)

    The dictionary or nil if not found



157
158
159
160
161
162
163
# File 'lib/kotoshu/dictionary/unix_words.rb', line 157

def self.detect(language_code:, locale: nil, case_sensitive: false)
  path = detect_system_dictionary
  return nil unless path

  new(path, language_code: language_code, locale: locale,
            case_sensitive: case_sensitive)
end

.detect_system_dictionaryString?

Detect system dictionary path.

Checks standard system paths for an existing dictionary file.

Examples:

UnixWords.detect_system_dictionary  # => "/usr/share/dict/words"

Returns:

  • (String, nil)

    The detected path or nil



144
145
146
# File 'lib/kotoshu/dictionary/unix_words.rb', line 144

def self.detect_system_dictionary
  SYSTEM_PATHS.find { |p| File.exist?(p) }
end

Instance Method Details

#add_word(word, flags: []) ⇒ Boolean

Add a word to the dictionary.

Parameters:

  • word (String)

    The word to add

  • flags (Array<String>) (defaults to: [])

    Flags (ignored for UnixWords)

Returns:

  • (Boolean)

    True if added



101
102
103
104
105
106
107
108
109
110
111
# File 'lib/kotoshu/dictionary/unix_words.rb', line 101

def add_word(word, flags: [])
  return false if word.nil? || word.empty?

  lookup_word = @case_sensitive ? word : word.downcase
  return false if @word_set.key?(lookup_word)

  @words << lookup_word
  @word_set[lookup_word] = @words.length - 1

  true
end

#lookup(word) ⇒ Boolean

Check if a word exists in the dictionary.

Parameters:

  • word (String)

    The word to look up

Returns:

  • (Boolean)

    True if the word exists



60
61
62
63
64
65
# File 'lib/kotoshu/dictionary/unix_words.rb', line 60

def lookup(word)
  return false if word.nil? || word.empty?

  lookup_word = @case_sensitive ? word : word.downcase
  @word_set.key?(lookup_word)
end

#remove_word(word) ⇒ Boolean

Remove a word from the dictionary.

Parameters:

  • word (String)

    The word to remove

Returns:

  • (Boolean)

    True if removed



117
118
119
120
121
122
123
124
125
126
127
# File 'lib/kotoshu/dictionary/unix_words.rb', line 117

def remove_word(word)
  return false if word.nil? || word.empty?

  lookup_word = @case_sensitive ? word : word.downcase
  return false unless @word_set.key?(lookup_word)

  index = @word_set.delete(lookup_word)
  @words.delete_at(index)

  true
end

#suggest(word, max_suggestions: 10) ⇒ Array<String>

Generate spelling suggestions.

Uses edit distance to find similar words in the dictionary.

Parameters:

  • word (String)

    The misspelled word

  • max_suggestions (Integer) (defaults to: 10)

    Maximum suggestions

Returns:

  • (Array<String>)

    List of suggested words



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/kotoshu/dictionary/unix_words.rb', line 74

def suggest(word, max_suggestions: 10)
  return [] if word.nil? || word.empty?

  # For now, use simple prefix matching and edit distance
  # This will be improved with the suggestion algorithms
  lookup_word = @case_sensitive ? word : word.downcase

  # Find words with same prefix
  prefix_len = [lookup_word.length - 1, 3].max
  prefix = lookup_word[0...prefix_len]
  candidates = @words.select { |w| w.start_with?(prefix) }

  # Calculate edit distances
  candidates.map do |dict_word|
    dist = edit_distance(lookup_word, dict_word)
    [dict_word, dist]
  end.select { |_, dist| dist.positive? && dist <= 2 } # Only close matches
            .sort_by { |_, dist| dist }
            .first(max_suggestions)
            .map(&:first)
end

#wordsArray<String>

Get all words in the dictionary.

Returns:

  • (Array<String>)

    All words



132
133
134
# File 'lib/kotoshu/dictionary/unix_words.rb', line 132

def words
  @words.dup
end