Class: Kotoshu::Core::IndexedDictionary
- Inherits:
-
Object
- Object
- Kotoshu::Core::IndexedDictionary
- Defined in:
- lib/kotoshu/core/indexed_dictionary.rb
Overview
Indexed dictionary for efficient word lookup with multiple indexes. This is MORE model-driven than Spylls which uses simple hash indices.
This is a proper domain model with rich behavior including:
-
Multiple indexes (case-sensitive, case-insensitive, prefix, suffix)
-
Rich query methods
-
Index management
-
Domain-specific behavior
Instance Attribute Summary collapse
-
#size ⇒ Object
readonly
Returns the value of attribute size.
-
#words ⇒ Object
readonly
Returns the value of attribute words.
Class Method Summary collapse
-
.from_file(path) ⇒ IndexedDictionary
Create indexed dictionary from a file.
-
.from_trie(trie) ⇒ IndexedDictionary
Create indexed dictionary from a Trie.
Instance Method Summary collapse
-
#add_word(word, metadata = {}) ⇒ IndexedDictionary
(also: #<<)
Add a word to the dictionary with optional metadata.
-
#add_words(new_words) ⇒ IndexedDictionary
Add multiple words.
-
#all_words ⇒ Array<String>
Get all words in the dictionary.
-
#count_by_first_letter ⇒ Hash
Get words starting with each letter (A-Z).
-
#count_by_length ⇒ Hash
Get word length distribution.
-
#each_with_index {|word, index| ... } ⇒ Enumerator
Iterate over all words with indices.
-
#each_word {|word| ... } ⇒ Enumerator
Iterate over all words.
-
#empty? ⇒ Boolean
Check if the dictionary is empty.
-
#find_by_length(length) ⇒ Array<String>
Find words of a specific length.
-
#find_by_length_range(min_length:, max_length:) ⇒ Array<String>
Find words within a length range.
-
#find_by_pattern(pattern) ⇒ Array<String>
Find words matching a pattern.
-
#find_by_prefix(prefix, ignore_case: false) ⇒ Array<String>
Find all words with a given prefix.
-
#find_by_suffix(suffix, ignore_case: false) ⇒ Array<String>
Find all words with a given suffix.
-
#has_word?(word) ⇒ Boolean
(also: #include?, #contains?)
Check if a word exists (case-sensitive).
-
#has_word_ignorecase?(word) ⇒ Boolean
Check if a word exists (case-insensitive).
-
#initialize(words = []) ⇒ IndexedDictionary
constructor
A new instance of IndexedDictionary.
-
#lookup(word) ⇒ Hash?
Look up a word (case-sensitive).
-
#lookup_ignorecase(word) ⇒ Hash?
Look up a word (case-insensitive).
-
#random_words(count: 1) ⇒ Array<String>
Get random words from the dictionary.
-
#statistics ⇒ Hash
Get statistics about the dictionary.
-
#to_s ⇒ String
(also: #inspect)
Convert to string.
-
#to_trie ⇒ Trie
Build a Trie from the dictionary words.
Constructor Details
#initialize(words = []) ⇒ IndexedDictionary
Returns a new instance of IndexedDictionary.
17 18 19 20 21 22 23 24 25 26 27 28 29 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 17 def initialize(words = []) @words = [] @indexes = { exact: {}, # case_sensitive: word => [positions] lowercase: {}, # case_insensitive: word.downcase => [positions] prefix: {}, # prefix => [words] suffix: {}, # suffix => [words] flag: {} # flag => [words] (future: for Hunspell) } @size = 0 words.each { |word| add_word(word) } end |
Instance Attribute Details
#size ⇒ Object (readonly)
Returns the value of attribute size.
14 15 16 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 14 def size @size end |
#words ⇒ Object (readonly)
Returns the value of attribute words.
14 15 16 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 14 def words @words end |
Class Method Details
.from_file(path) ⇒ IndexedDictionary
Create indexed dictionary from a file.
276 277 278 279 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 276 def self.from_file(path) words = File.foreach(path, chomp: true).reject { |l| l.empty? || l.start_with?("#") } new(words) end |
.from_trie(trie) ⇒ IndexedDictionary
Create indexed dictionary from a Trie.
285 286 287 288 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 285 def self.from_trie(trie) words = trie.all_words new(words) end |
Instance Method Details
#add_word(word, metadata = {}) ⇒ IndexedDictionary Also known as: <<
Add a word to the dictionary with optional metadata.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 36 def add_word(word, = {}) # Store the word with its index and metadata entry = { word: word, index: @size, metadata: } @words << entry @size += 1 # Update exact index (case-sensitive) @indexes[:exact][word] ||= [] @indexes[:exact][word] << @size - 1 # Update lowercase index (case-insensitive) lower = word.downcase @indexes[:lowercase][lower] ||= [] @indexes[:lowercase][lower] << @size - 1 # Update prefix indexes (for prefix searching) (1...word.length).each do |i| prefix = word[0...i] @indexes[:prefix][prefix] ||= [] @indexes[:prefix][prefix] << word # Update suffix indexes (for suffix searching) suffix = word[i..] @indexes[:suffix][suffix] ||= [] @indexes[:suffix][suffix] << word end self end |
#add_words(new_words) ⇒ IndexedDictionary
Add multiple words.
71 72 73 74 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 71 def add_words(new_words) new_words.each { |word| add_word(word) } self end |
#all_words ⇒ Array<String>
Get all words in the dictionary.
172 173 174 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 172 def all_words @words.map { |entry| entry[:word] } end |
#count_by_first_letter ⇒ Hash
Get words starting with each letter (A-Z).
190 191 192 193 194 195 196 197 198 199 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 190 def count_by_first_letter result = Hash.new(0) all_words.each do |word| next if word.empty? letter = word[0].upcase result[letter] += 1 end result end |
#count_by_length ⇒ Hash
Get word length distribution.
204 205 206 207 208 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 204 def count_by_length result = Hash.new(0) all_words.each { |word| result[word.length] += 1 } result end |
#each_with_index {|word, index| ... } ⇒ Enumerator
Iterate over all words with indices.
231 232 233 234 235 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 231 def each_with_index return enum_for(:each_with_index) unless block_given? @words.each { |entry| yield entry[:word], entry[:index] } end |
#each_word {|word| ... } ⇒ Enumerator
Iterate over all words.
221 222 223 224 225 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 221 def each_word return enum_for(:each_word) unless block_given? @words.each { |entry| yield entry[:word] } end |
#empty? ⇒ Boolean
Check if the dictionary is empty.
213 214 215 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 213 def empty? @size.zero? end |
#find_by_length(length) ⇒ Array<String>
Find words of a specific length.
156 157 158 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 156 def find_by_length(length) all_words.select { |w| w.length == length } end |
#find_by_length_range(min_length:, max_length:) ⇒ Array<String>
Find words within a length range.
165 166 167 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 165 def find_by_length_range(min_length:, max_length:) all_words.select { |w| w.length >= min_length && w.length <= max_length } end |
#find_by_pattern(pattern) ⇒ Array<String>
Find words matching a pattern.
148 149 150 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 148 def find_by_pattern(pattern) all_words.select { |w| w.match?(pattern) } end |
#find_by_prefix(prefix, ignore_case: false) ⇒ Array<String>
Find all words with a given prefix.
121 122 123 124 125 126 127 128 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 121 def find_by_prefix(prefix, ignore_case: false) if ignore_case prefix_lower = prefix.downcase all_words.select { |w| w.downcase.start_with?(prefix_lower) } else @indexes[:prefix].fetch(prefix, []).dup end end |
#find_by_suffix(suffix, ignore_case: false) ⇒ Array<String>
Find all words with a given suffix.
135 136 137 138 139 140 141 142 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 135 def find_by_suffix(suffix, ignore_case: false) if ignore_case suffix_lower = suffix.downcase all_words.select { |w| w.downcase.end_with?(suffix_lower) } else @indexes[:suffix].fetch(suffix, []).dup end end |
#has_word?(word) ⇒ Boolean Also known as: include?, contains?
Check if a word exists (case-sensitive).
80 81 82 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 80 def has_word?(word) @indexes[:exact].key?(word) end |
#has_word_ignorecase?(word) ⇒ Boolean
Check if a word exists (case-insensitive).
90 91 92 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 90 def has_word_ignorecase?(word) @indexes[:lowercase].key?(word.downcase) end |
#lookup(word) ⇒ Hash?
Look up a word (case-sensitive).
98 99 100 101 102 103 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 98 def lookup(word) indices = @indexes[:exact][word] return nil if indices.nil? || indices.empty? @words[indices.first] end |
#lookup_ignorecase(word) ⇒ Hash?
Look up a word (case-insensitive).
109 110 111 112 113 114 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 109 def lookup_ignorecase(word) indices = @indexes[:lowercase][word.downcase] return nil if indices.nil? || indices.empty? @words[indices.first] end |
#random_words(count: 1) ⇒ Array<String>
Get random words from the dictionary.
180 181 182 183 184 185 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 180 def random_words(count: 1) return [] if @words.empty? indices = (0...@size).to_a.sample(count) indices.map { |i| @words[i][:word] } end |
#statistics ⇒ Hash
Get statistics about the dictionary.
250 251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 250 def statistics lengths = all_words.map(&:length) { total_words: @size, unique_words: all_words.uniq.size, min_length: lengths.min || 0, max_length: lengths.max || 0, avg_length: lengths.empty? ? 0 : (lengths.sum.to_f / lengths.size).round(2), count_by_first_letter: count_by_first_letter, count_by_length: count_by_length } end |
#to_s ⇒ String Also known as: inspect
Convert to string.
267 268 269 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 267 def to_s "IndexedDictionary(size: #{@size})" end |
#to_trie ⇒ Trie
Build a Trie from the dictionary words.
240 241 242 243 244 245 |
# File 'lib/kotoshu/core/indexed_dictionary.rb', line 240 def to_trie require_relative "trie/trie" require_relative "trie/builder" Trie::Builder.from_array(all_words) end |