Class: Kotoshu::Algorithms::Capitalization::Casing
- Inherits:
-
Object
- Object
- Kotoshu::Algorithms::Capitalization::Casing
- Defined in:
- lib/kotoshu/algorithms/capitalization.rb
Overview
Base class for casing-related algorithms specific for dictionary’s language.
This is a class (not a set of functions) because it needs to have subclasses for specific language casing, which have only some aspects different from generic one.
Direct Known Subclasses
Instance Method Summary collapse
-
#capitalize(word) ⇒ Enumerator<String>
Capitalize (convert word to all lowercase and first letter uppercase).
-
#coerce(word, cap) ⇒ String
Used by suggest: by known (valid) suggestion, and initial word’s capitalization, produce proper suggestion capitalization.
-
#corrections(word) ⇒ Array<Symbol, Array<String>>
Returns hypotheses of how the word might have been cased if it is a misspelling.
-
#guess(word) ⇒ Symbol
Guess word’s capitalization.
-
#lower(word) ⇒ Array<String>
Lowercases the word.
-
#lowerfirst(word) ⇒ Enumerator<String>
Just change the case of the first letter to lower.
-
#upper(word) ⇒ String
Uppercase the word.
-
#variants(word) ⇒ Array<Symbol, Array<String>>
Returns hypotheses of how the word might have been cased (in dictionary), if we consider it is spelled correctly.
Instance Method Details
#capitalize(word) ⇒ Enumerator<String>
Capitalize (convert word to all lowercase and first letter uppercase). Returns a list of results for same reasons as lower.
85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 85 def capitalize(word) return enum_for(:capitalize, word) unless block_given? if word.length == 1 yield upper(word[0]) else upper_first = upper(word[0]) lower(word[1..]).each do |lowered| yield upper_first + lowered end end end |
#coerce(word, cap) ⇒ String
Used by suggest: by known (valid) suggestion, and initial word’s capitalization, produce proper suggestion capitalization.
Example: If misspelling was “Kiten” (INIT capitalization), found suggestion “kitten”, then this method makes it “Kitten”.
174 175 176 177 178 179 180 181 182 183 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 174 def coerce(word, cap) case cap when Type::INIT, Type::HUHINIT upper(word[0]) + word[1..] when Type::ALL upper(word) else word end end |
#corrections(word) ⇒ Array<Symbol, Array<String>>
Returns hypotheses of how the word might have been cased if it is a misspelling.
Example: “DiCtionary” (HUHINIT capitalization) produces hypotheses “DiCtionary”, “diCtionary”, “dictionary”, “Dictionary”, and all of them are checked by Suggest.
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 146 def corrections(word) captype = guess(word) result = case captype when Type::NO [word] when Type::INIT [word, *lower(word)] when Type::HUHINIT [word, *lowerfirst(word).to_a, *lower(word), *capitalize(word).to_a] when Type::HUH [word, *lower(word)] when Type::ALL [word, *lower(word), *capitalize(word).to_a] end [captype, result] end |
#guess(word) ⇒ Symbol
Guess word’s capitalization. Redefined in GermanCasing.
37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 37 def guess(word) return Type::NO if word.downcase == word return Type::ALL if word.upcase == word return Type::INIT if word[0].upcase == word[0] && word[1..].downcase == word[1..] if word[0].upcase == word[0] Type::HUHINIT else Type::HUH end end |
#lower(word) ⇒ Array<String>
Lowercases the word. Returns list of possible lowercasings for all casing classes to behave consistently.
In GermanCasing (and only there), lowercasing word like “STRASSE” produces two possibilities: “strasse” and “ße” (ß is most of the time upcased to SS, so we can’t decide which of downcased words is “right” and need to check both).
Also redefined in TurkicCasing, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”.
62 63 64 65 66 67 68 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 62 def lower(word) # Can't be properly lowercased in non-Turkic collation return [] if word.nil? || word.empty? || word[0] == 'İ' # Turkic "lowercase dot i" to latinic "i", just in case [word.downcase.gsub('i̇', 'i')] end |
#lowerfirst(word) ⇒ Enumerator<String>
Just change the case of the first letter to lower. Returns a list of results for same reasons as lower.
103 104 105 106 107 108 109 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 103 def lowerfirst(word) return enum_for(:lowerfirst, word) unless block_given? lower(word[0]).each do |lowered| yield lowered + word[1..] end end |
#upper(word) ⇒ String
Uppercase the word. Redefined in TurkicCasing, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”.
76 77 78 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 76 def upper(word) word.upcase end |
#variants(word) ⇒ Array<Symbol, Array<String>>
Returns hypotheses of how the word might have been cased (in dictionary), if we consider it is spelled correctly.
Example: If word is “Kitten”, hypotheses are “kitten”, “Kitten”.
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/kotoshu/algorithms/capitalization.rb', line 118 def variants(word) captype = guess(word) result = case captype when Type::NO [word] when Type::INIT [word, *lower(word)] when Type::HUHINIT [word, *lowerfirst(word).to_a] when Type::HUH [word] when Type::ALL [word, *lower(word), *capitalize(word).to_a] end [captype, result] end |