Class: Kotoshu::Core::IndexedDictionary

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/core/indexed_dictionary.rb

Overview

Indexed dictionary for efficient word lookup with multiple indexes. This is MORE model-driven than Spylls which uses simple hash indices.

This is a proper domain model with rich behavior including:

  • Multiple indexes (case-sensitive, case-insensitive, prefix, suffix)

  • Rich query methods

  • Index management

  • Domain-specific behavior

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(words = []) ⇒ IndexedDictionary

Returns a new instance of IndexedDictionary.

Parameters:

  • words (Array<String>) (defaults to: [])

    Initial words to add



17
18
19
20
21
22
23
24
25
26
27
28
29
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 17

def initialize(words = [])
  @words = []
  @indexes = {
    exact: {},              # case_sensitive: word => [positions]
    lowercase: {},          # case_insensitive: word.downcase => [positions]
    prefix: {},             # prefix => [words]
    suffix: {},             # suffix => [words]
    flag: {}                # flag => [words] (future: for Hunspell)
  }
  @size = 0

  words.each { |word| add_word(word) }
end

Instance Attribute Details

#sizeObject (readonly)

Returns the value of attribute size.



14
15
16
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 14

def size
  @size
end

#wordsObject (readonly)

Returns the value of attribute words.



14
15
16
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 14

def words
  @words
end

Class Method Details

.from_file(path) ⇒ IndexedDictionary

Create indexed dictionary from a file.

Parameters:

  • path (String)

    Path to word list file

Returns:



276
277
278
279
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 276

def self.from_file(path)
  words = File.foreach(path, chomp: true).reject { |l| l.empty? || l.start_with?("#") }
  new(words)
end

.from_trie(trie) ⇒ IndexedDictionary

Create indexed dictionary from a Trie.

Parameters:

  • trie (Trie)

    The trie to convert

Returns:



285
286
287
288
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 285

def self.from_trie(trie)
  words = trie.all_words
  new(words)
end

Instance Method Details

#add_word(word, metadata = {}) ⇒ IndexedDictionary Also known as: <<

Add a word to the dictionary with optional metadata.

Parameters:

  • word (String)

    The word to add

  • metadata (Hash) (defaults to: {})

    Optional metadata associated with the word

Returns:



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 36

def add_word(word,  = {})
  # Store the word with its index and metadata
  entry = { word: word, index: @size, metadata:  }
  @words << entry
  @size += 1

  # Update exact index (case-sensitive)
  @indexes[:exact][word] ||= []
  @indexes[:exact][word] << @size - 1

  # Update lowercase index (case-insensitive)
  lower = word.downcase
  @indexes[:lowercase][lower] ||= []
  @indexes[:lowercase][lower] << @size - 1

  # Update prefix indexes (for prefix searching)
  (1...word.length).each do |i|
    prefix = word[0...i]
    @indexes[:prefix][prefix] ||= []
    @indexes[:prefix][prefix] << word

    # Update suffix indexes (for suffix searching)
    suffix = word[i..]
    @indexes[:suffix][suffix] ||= []
    @indexes[:suffix][suffix] << word
  end

  self
end

#add_words(new_words) ⇒ IndexedDictionary

Add multiple words.

Parameters:

  • new_words (Array<String>)

    Words to add

Returns:



71
72
73
74
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 71

def add_words(new_words)
  new_words.each { |word| add_word(word) }
  self
end

#all_wordsArray<String>

Get all words in the dictionary.

Returns:

  • (Array<String>)

    All words



172
173
174
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 172

def all_words
  @words.map { |entry| entry[:word] }
end

#count_by_first_letterHash

Get words starting with each letter (A-Z).

Returns:

  • (Hash)

    Hash of letter => word count



190
191
192
193
194
195
196
197
198
199
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 190

def count_by_first_letter
  result = Hash.new(0)
  all_words.each do |word|
    next if word.empty?

    letter = word[0].upcase
    result[letter] += 1
  end
  result
end

#count_by_lengthHash

Get word length distribution.

Returns:

  • (Hash)

    Hash of length => count



204
205
206
207
208
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 204

def count_by_length
  result = Hash.new(0)
  all_words.each { |word| result[word.length] += 1 }
  result
end

#each_with_index {|word, index| ... } ⇒ Enumerator

Iterate over all words with indices.

Yields:

  • (word, index)

    Each word and its index

Returns:

  • (Enumerator)

    Enumerator if no block given



231
232
233
234
235
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 231

def each_with_index
  return enum_for(:each_with_index) unless block_given?

  @words.each { |entry| yield entry[:word], entry[:index] }
end

#each_word {|word| ... } ⇒ Enumerator

Iterate over all words.

Yields:

  • (word)

    Each word

Returns:

  • (Enumerator)

    Enumerator if no block given



221
222
223
224
225
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 221

def each_word
  return enum_for(:each_word) unless block_given?

  @words.each { |entry| yield entry[:word] }
end

#empty?Boolean

Check if the dictionary is empty.

Returns:

  • (Boolean)

    True if empty



213
214
215
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 213

def empty?
  @size.zero?
end

#find_by_length(length) ⇒ Array<String>

Find words of a specific length.

Parameters:

  • length (Integer)

    The exact length

Returns:

  • (Array<String>)

    Words of the given length



156
157
158
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 156

def find_by_length(length)
  all_words.select { |w| w.length == length }
end

#find_by_length_range(min_length:, max_length:) ⇒ Array<String>

Find words within a length range.

Parameters:

  • min_length (Integer)

    Minimum length

  • max_length (Integer)

    Maximum length

Returns:

  • (Array<String>)

    Words within the length range



165
166
167
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 165

def find_by_length_range(min_length:, max_length:)
  all_words.select { |w| w.length >= min_length && w.length <= max_length }
end

#find_by_pattern(pattern) ⇒ Array<String>

Find words matching a pattern.

Parameters:

  • pattern (Regexp)

    The pattern to match

Returns:

  • (Array<String>)

    Matching words



148
149
150
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 148

def find_by_pattern(pattern)
  all_words.select { |w| w.match?(pattern) }
end

#find_by_prefix(prefix, ignore_case: false) ⇒ Array<String>

Find all words with a given prefix.

Parameters:

  • prefix (String)

    The prefix to match

  • ignore_case (Boolean) (defaults to: false)

    Whether to ignore case

Returns:

  • (Array<String>)

    Words with the prefix



121
122
123
124
125
126
127
128
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 121

def find_by_prefix(prefix, ignore_case: false)
  if ignore_case
    prefix_lower = prefix.downcase
    all_words.select { |w| w.downcase.start_with?(prefix_lower) }
  else
    @indexes[:prefix].fetch(prefix, []).dup
  end
end

#find_by_suffix(suffix, ignore_case: false) ⇒ Array<String>

Find all words with a given suffix.

Parameters:

  • suffix (String)

    The suffix to match

  • ignore_case (Boolean) (defaults to: false)

    Whether to ignore case

Returns:

  • (Array<String>)

    Words with the suffix



135
136
137
138
139
140
141
142
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 135

def find_by_suffix(suffix, ignore_case: false)
  if ignore_case
    suffix_lower = suffix.downcase
    all_words.select { |w| w.downcase.end_with?(suffix_lower) }
  else
    @indexes[:suffix].fetch(suffix, []).dup
  end
end

#has_word?(word) ⇒ Boolean Also known as: include?, contains?

Check if a word exists (case-sensitive).

Parameters:

  • word (String)

    The word to check

Returns:

  • (Boolean)

    True if word exists



80
81
82
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 80

def has_word?(word)
  @indexes[:exact].key?(word)
end

#has_word_ignorecase?(word) ⇒ Boolean

Check if a word exists (case-insensitive).

Parameters:

  • word (String)

    The word to check

Returns:

  • (Boolean)

    True if word exists (any case)



90
91
92
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 90

def has_word_ignorecase?(word)
  @indexes[:lowercase].key?(word.downcase)
end

#lookup(word) ⇒ Hash?

Look up a word (case-sensitive).

Parameters:

  • word (String)

    The word to look up

Returns:

  • (Hash, nil)

    Word entry or nil



98
99
100
101
102
103
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 98

def lookup(word)
  indices = @indexes[:exact][word]
  return nil if indices.nil? || indices.empty?

  @words[indices.first]
end

#lookup_ignorecase(word) ⇒ Hash?

Look up a word (case-insensitive).

Parameters:

  • word (String)

    The word to look up

Returns:

  • (Hash, nil)

    Word entry or nil



109
110
111
112
113
114
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 109

def lookup_ignorecase(word)
  indices = @indexes[:lowercase][word.downcase]
  return nil if indices.nil? || indices.empty?

  @words[indices.first]
end

#random_words(count: 1) ⇒ Array<String>

Get random words from the dictionary.

Parameters:

  • count (Integer) (defaults to: 1)

    Number of random words

Returns:

  • (Array<String>)

    Random words



180
181
182
183
184
185
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 180

def random_words(count: 1)
  return [] if @words.empty?

  indices = (0...@size).to_a.sample(count)
  indices.map { |i| @words[i][:word] }
end

#statisticsHash

Get statistics about the dictionary.

Returns:

  • (Hash)

    Statistics



250
251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 250

def statistics
  lengths = all_words.map(&:length)

  {
    total_words: @size,
    unique_words: all_words.uniq.size,
    min_length: lengths.min || 0,
    max_length: lengths.max || 0,
    avg_length: lengths.empty? ? 0 : (lengths.sum.to_f / lengths.size).round(2),
    count_by_first_letter: count_by_first_letter,
    count_by_length: count_by_length
  }
end

#to_sString Also known as: inspect

Convert to string.

Returns:

  • (String)

    String representation



267
268
269
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 267

def to_s
  "IndexedDictionary(size: #{@size})"
end

#to_trieTrie

Build a Trie from the dictionary words.

Returns:

  • (Trie)

    New trie containing all words



240
241
242
243
244
245
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 240

def to_trie
  require_relative "trie/trie"
  require_relative "trie/builder"

  Trie::Builder.from_array(all_words)
end