Module: Kotoshu::Algorithms::Permutations
- Defined in:
- lib/kotoshu/algorithms/permutations.rb
Overview
Word permutation algorithms for generating spelling variations.
Ported from Spylls (Python) permutations.py
These functions generate various word edits that are used by the suggestion system to find possible corrections for misspelled words.
Method names match Hunspell’s suggest.cxx to maintain compatibility.
Constant Summary collapse
- MAX_CHAR_DISTANCE =
4
Class Method Summary collapse
-
.badchar(word, trystring) {|String| ... } ⇒ Object
Produces permutations with chars replaced by chars in TRY set.
-
.badcharkey(word, layout) {|String| ... } ⇒ Object
Produces permutations with chars replaced by adjacent chars on keyboard layout (“vat -> cat”) or downcased (if it was accidental uppercase).
-
.doubletwochars(word) {|String| ... } ⇒ Object
Produces permutations with accidental two-letter-doubling fixed.
-
.extrachar(word) {|String| ... } ⇒ Object
Produces permutations with one char removed in all possible positions.
-
.forgotchar(word, trystring) {|String| ... } ⇒ Object
Produces permutations with one char inserted in all possible positions.
-
.longswapchar(word) {|String| ... } ⇒ Object
Produces permutations with non-adjacent chars swapped (up to 4 chars distance).
-
.mapchars(word, maptable) {|String| ... } ⇒ Object
Uses MAP table (sets of potentially similar chars) and tries to replace them recursively.
-
.movechar(word) {|String| ... } ⇒ Object
Produces permutations with one character moved by 2, 3 or 4 places forward or backward (not 1, because adjacent swaps are already handled by swapchar).
-
.replchars(word, reptable) {|String, Array<String>| ... } ⇒ Object
Uses REP table (typical misspellings) to replace patterns in word.
-
.swapchar(word) {|String| ... } ⇒ Object
Produces permutations with adjacent chars swapped.
-
.twowords(word) {|Array<String>| ... } ⇒ Object
Produces permutations of splitting word into two in all possible positions.
Class Method Details
.badchar(word, trystring) {|String| ... } ⇒ Object
Produces permutations with chars replaced by chars in TRY set.
214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 214 def badchar(word, trystring) return if trystring.nil? || trystring.empty? trystring.each_char do |c| (word.length - 1).downto(0) do |i| next if word[i] == c yield word[0...i] + c + word[(i + 1)..] end end end |
.badcharkey(word, layout) {|String| ... } ⇒ Object
Produces permutations with chars replaced by adjacent chars on keyboard layout (“vat -> cat”) or downcased (if it was accidental uppercase).
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 123 def badcharkey(word, layout) chars = word.chars chars.each_with_index do |c, i| before = word[0...i] after = word[(i + 1)..] # Try uppercasing if not already uppercase unless c == c.upcase yield before + c.upcase + after.to_s end next if layout.nil? || layout.empty? # Try adjacent keys on keyboard pos = layout.index(c) next unless pos while pos if pos.positive? && layout[pos - 1] != '|' yield before + layout[pos - 1] + after.to_s end if pos + 1 < layout.length && layout[pos + 1] != '|' yield before + layout[pos + 1] + after.to_s end pos = layout.index(c, pos + 1) end end end |
.doubletwochars(word) {|String| ... } ⇒ Object
Produces permutations with accidental two-letter-doubling fixed. Example: “vacacation” -> “vacation”
231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 231 def doubletwochars(word) return if word.length < 5 (2...word.length).each do |i| # Check if word[i-2] == word[i] and word[i-3] == word[i-1] # Example: vacacation -> "ca" at positions 3-4, so "vac" at 2-4 if word[i - 2] == word[i] && word[i - 3] == word[i - 1] yield word[0...(i - 1)] + word[(i + 1)..] end end end |
.extrachar(word) {|String| ... } ⇒ Object
Produces permutations with one char removed in all possible positions.
156 157 158 159 160 161 162 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 156 def extrachar(word) return if word.length < 2 word.length.times do |i| yield word[0...i] + word[(i + 1)..] end end |
.forgotchar(word, trystring) {|String| ... } ⇒ Object
Produces permutations with one char inserted in all possible positions.
List of chars is taken from TRY string – if absent, tries nothing. Chars are expected to be sorted in order of usage in language.
172 173 174 175 176 177 178 179 180 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 172 def forgotchar(word, trystring) return if trystring.nil? || trystring.empty? trystring.each_char do |c| (0..word.length).each do |i| yield word[0...i] + c + word[i..] end end end |
.longswapchar(word) {|String| ... } ⇒ Object
Produces permutations with non-adjacent chars swapped (up to 4 chars distance).
103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 103 def longswapchar(word) chars = word.chars (0...chars.length - 2).each do |first| ((first + 2)...[first + MAX_CHAR_DISTANCE, chars.length].min).each do |second| swapped = chars[0...first] + [chars[second]] + chars[(first + 1)...second] + [chars[first]] + chars[(second + 1)..] yield swapped.join end end end |
.mapchars(word, maptable) {|String| ... } ⇒ Object
Uses MAP table (sets of potentially similar chars) and tries to replace them recursively.
Example: Assuming MAP has entry “aáã”, and we have misspelling “anarchia”:
mapchars will produce: "ánarchia", "ánárchia", "ánárchiá", etc.
68 69 70 71 72 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 68 def mapchars(word, maptable) return if word.length < 2 || maptable.nil? || maptable.empty? mapchars_internal(word, 0, maptable) { |variant| yield variant } end |
.movechar(word) {|String| ... } ⇒ Object
Produces permutations with one character moved by 2, 3 or 4 places forward or backward (not 1, because adjacent swaps are already handled by swapchar).
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 187 def movechar(word) return if word.length < 2 chars = word.chars # Move characters forward chars.each_with_index do |char, frompos| ((frompos + 3)...[chars.length, frompos + MAX_CHAR_DISTANCE + 1].min).each do |topos| moved = chars[0...frompos] + chars[(frompos + 1)...topos] + [char] + chars[topos..] yield moved.join end end # Move characters backward (chars.length - 1).downto(0) do |frompos| [[0, frompos - MAX_CHAR_DISTANCE + 1].max, frompos - 1].min.downto(0) do |topos| moved = chars[0...topos] + [chars[frompos]] + chars[topos...frompos] + chars[(frompos + 1)..] yield moved.join end end end |
.replchars(word, reptable) {|String, Array<String>| ... } ⇒ Object
Uses REP table (typical misspellings) to replace patterns in word.
If the pattern’s replacement contains “_”, it means replacing to “ ” and yielding two different hypotheses:
1. It was one (dictionary) word "foo bar" (checked as such)
2. It was words ["foo", "bar"] (checked separately)
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 33 def replchars(word, reptable) return if word.length < 2 || reptable.nil? || reptable.empty? reptable.each do |pattern| str = word.to_s pos = 0 while (match_data = pattern[:regexp].match(str, pos)) suggestion = str[0...match_data.begin(0)] + pattern[:replacement].gsub('_', ' ') + str[match_data.end(0)..] yield suggestion yield suggestion.split(' ', 2) if suggestion.include?(' ') # Move past this match to find next occurrence pos = match_data.end(0) break if pos >= str.length end end end |
.swapchar(word) {|String| ... } ⇒ Object
Produces permutations with adjacent chars swapped.
For short (4 or 5 letters) words also produces double swaps: ahev -> have
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 80 def swapchar(word) return if word.length < 2 chars = word.chars (0...chars.length - 1).each do |i| swapped = chars[0...i] + [chars[i + 1], chars[i]] + chars[(i + 2)..] yield swapped.join end # Try double swaps for short words # ahev -> have, owudl -> would if [4, 5].include?(word.length) yield word[1] + word[0] + (word.length == 5 ? word[2] : '') + word[-1] + word[-2] if word.length == 5 yield word[0] + word[2] + word[1] + word[-1] + word[-2] end end end |
.twowords(word) {|Array<String>| ... } ⇒ Object
Produces permutations of splitting word into two in all possible positions.
247 248 249 250 251 |
# File 'lib/kotoshu/algorithms/permutations.rb', line 247 def twowords(word) (1...word.length).each do |i| yield [word[0...i], word[i..]] end end |