Class: Kotoshu::Algorithms::Lookup::Lookuper
- Inherits:
-
Object
- Object
- Kotoshu::Algorithms::Lookup::Lookuper
- Defined in:
- lib/kotoshu/algorithms/lookup.rb
Overview
Main word correctness lookup class.
Typically, you would not use this directly.
Example:
dictionary = Kotoshu::Dictionary.load('en_US')
lookuper = dictionary.lookuper
lookuper.call('spylls') # => false
lookuper.call('spells') # => true
lookuper.good_forms('spells') do |form|
puts form
end
# AffixForm(spells = spells)
# AffixForm(spells = spell + Suffix(s: S×, on [[^sxzhy]]$))
Instance Attribute Summary collapse
-
#aff ⇒ Hash
readonly
Aff data structure (from aff file).
-
#dic ⇒ Hash
readonly
Dic data structure (from dic file).
Instance Method Summary collapse
-
#break_word(text, depth = 0) {|Array<String>| ... } ⇒ Enumerator
Recursively produce all possible lists of word breaking by break patterns (like dashes).
-
#call(word, capitalization: true, allow_nosuggest: true) ⇒ Boolean
The outermost word correctness check.
-
#correct?(word, capitalization: true, allow_nosuggest: true, affix_forms: true, compound_forms: true) ⇒ Boolean
(also: #is_correct?)
Check if the word is correct without yielding forms.
-
#good_forms(word, capitalization: true, allow_nosuggest: true, affix_forms: true, compound_forms: true) {|AffixForm, CompoundForm| ... } ⇒ Object
The main producer of correct word forms.
-
#initialize(aff, dic) ⇒ Lookuper
constructor
A new instance of Lookuper.
Constructor Details
#initialize(aff, dic) ⇒ Lookuper
Returns a new instance of Lookuper.
190 191 192 193 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 190 def initialize(aff, dic) @aff = aff @dic = dic end |
Instance Attribute Details
#aff ⇒ Hash (readonly)
Returns Aff data structure (from aff file).
185 186 187 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 185 def aff @aff end |
#dic ⇒ Hash (readonly)
Returns Dic data structure (from dic file).
188 189 190 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 188 def dic @dic end |
Instance Method Details
#break_word(text, depth = 0) {|Array<String>| ... } ⇒ Enumerator
Recursively produce all possible lists of word breaking by break patterns (like dashes).
Example: “pre-processed-meat” would produce:
["pre-processed-meat"]
["pre", "processed-meat"]
["pre", "processed", "meat"]
["pre-processed", "meat"]
This is necessary because dictionary might contain “pre-processed” as a separate entry.
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 254 def break_word(text, depth = 0) return enum_for(:break_word, text, depth) unless block_given? return if depth > 10 # Return whole text as first option yield [text] break_patterns = @aff[:BREAK] || [] break_patterns.each do |pattern| str = text.to_s pos = 0 while (match_data = pattern[:matcher].match(str, pos)) start = str[0...match_data.begin(1)] rest = str[match_data.end(1)..] break_word(rest, depth + 1) do |breaking| yield [start, *breaking] end pos = match_data.end(0) break if pos >= str.length end end end |
#call(word, capitalization: true, allow_nosuggest: true) ⇒ Boolean
The outermost word correctness check.
Basically, prepares word for check (converting/removing chars), and then checks whether any good word form can be produced with good_forms. If there is none, also tries to break word by break-points.
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 205 def call(word, capitalization: true, allow_nosuggest: true) # Check if word is correct is_correct = ->(w) do good_forms(w, capitalization: capitalization, allow_nosuggest: allow_nosuggest).any? end # If all entries matching the word have FORBIDDENWORD flag, word can't be correct if @aff[:FORBIDDENWORD] && @dic[:has_flag]&.call(word, @aff[:FORBIDDENWORD], for_all: true) return false end # Convert word with ICONV table word_to_check = @aff[:ICONV] ? @aff[:ICONV].call(word) : word # Remove ignored characters if @aff[:IGNORE] ignore_chars = @aff[:IGNORE] word_to_check = word_to_check.chars.reject { |c| ignore_chars.include?(c) }.join end # Numbers are always good return true if NUMBER_REGEXP.match?(word_to_check) # Try breaking word by break patterns break_word(word_to_check).each do |parts| if parts.all? { |part| part.empty? || is_correct.call(part) } return true end end false end |
#correct?(word, capitalization: true, allow_nosuggest: true, affix_forms: true, compound_forms: true) ⇒ Boolean Also known as: is_correct?
Check if the word is correct without yielding forms.
Convenience method for simple correctness checks.
354 355 356 357 358 359 360 361 362 363 364 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 354 def correct?(word, capitalization: true, allow_nosuggest: true, affix_forms: true, compound_forms: true) good_forms(word, capitalization: capitalization, allow_nosuggest: allow_nosuggest, affix_forms: affix_forms, compound_forms: compound_forms).any? end |
#good_forms(word, capitalization: true, allow_nosuggest: true, affix_forms: true, compound_forms: true) {|AffixForm, CompoundForm| ... } ⇒ Object
The main producer of correct word forms.
Produces all ways the proposed string might correspond to dictionary/ affixes. If there is at least one, the word is correctly spelled.
Example:
lookuper.good_forms('building') do |form|
puts form
end
# AffixForm(building = building) # noun
# AffixForm(building = build + Suffix(ing: G×, on [[^e]]$)) # verb
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
# File 'lib/kotoshu/algorithms/lookup.rb', line 298 def good_forms(word, capitalization: true, allow_nosuggest: true, affix_forms: true, compound_forms: true) return enum_for(:good_forms, word, capitalization: capitalization, allow_nosuggest: allow_nosuggest, affix_forms: affix_forms, compound_forms: compound_forms) unless block_given? # Get capitalization variants if capitalization captype, variants = @aff[:casing].variants(word) else captype = @aff[:casing].guess(word) variants = [word] end # Check each variant variants.each do |variant| if affix_forms affix_forms_internal(variant, captype: captype, allow_nosuggest: allow_nosuggest) do |form| # Special German ß handling if @aff[:CHECKSHARPS] && @aff[:KEEPCASE] stem = form.in_dictionary ? form.in_dictionary[:stem] : form.stem if stem.include?('ß') && captype == Capitalization::Type::ALL && word.include?('ß') && form.flags.include?(@aff[:KEEPCASE]) next end end yield form end end if compound_forms compound_forms_internal(variant, captype: captype, allow_nosuggest: allow_nosuggest) do |form| yield form end end end end |