Module: Kotoshu::Algorithms::Permutations

Defined in:
lib/kotoshu/algorithms/permutations.rb

Overview

Word permutation algorithms for generating spelling variations.

Ported from Spylls (Python) permutations.py

These functions generate various word edits that are used by the suggestion system to find possible corrections for misspelled words.

Method names match Hunspell’s suggest.cxx to maintain compatibility.

Constant Summary collapse

MAX_CHAR_DISTANCE =
4

Class Method Summary collapse

Class Method Details

.badchar(word, trystring) {|String| ... } ⇒ Object

Produces permutations with chars replaced by chars in TRY set.

Parameters:

  • word (String)

    The word to process

  • trystring (String)

    Characters to try replacing with (from aff TRY directive)

Yields:

  • (String)

    Each variant with replaced char



214
215
216
217
218
219
220
221
222
223
224
# File 'lib/kotoshu/algorithms/permutations.rb', line 214

def badchar(word, trystring)
  return if trystring.nil? || trystring.empty?

  trystring.each_char do |c|
    (word.length - 1).downto(0) do |i|
      next if word[i] == c

      yield word[0...i] + c + word[(i + 1)..]
    end
  end
end

.badcharkey(word, layout) {|String| ... } ⇒ Object

Produces permutations with chars replaced by adjacent chars on keyboard layout (“vat -> cat”) or downcased (if it was accidental uppercase).

Parameters:

  • word (String)

    The word to process

  • layout (String)

    Keyboard layout string (KEY from aff file)

Yields:

  • (String)

    Each variant with replaced chars



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/kotoshu/algorithms/permutations.rb', line 123

def badcharkey(word, layout)
  chars = word.chars
  chars.each_with_index do |c, i|
    before = word[0...i]
    after = word[(i + 1)..]

    # Try uppercasing if not already uppercase
    unless c == c.upcase
      yield before + c.upcase + after.to_s
    end

    next if layout.nil? || layout.empty?

    # Try adjacent keys on keyboard
    pos = layout.index(c)
    next unless pos

    while pos
      if pos.positive? && layout[pos - 1] != '|'
        yield before + layout[pos - 1] + after.to_s
      end
      if pos + 1 < layout.length && layout[pos + 1] != '|'
        yield before + layout[pos + 1] + after.to_s
      end
      pos = layout.index(c, pos + 1)
    end
  end
end

.doubletwochars(word) {|String| ... } ⇒ Object

Produces permutations with accidental two-letter-doubling fixed. Example: “vacacation” -> “vacation”

Parameters:

  • word (String)

    The word to process

Yields:

  • (String)

    Each variant with fixed doubling



231
232
233
234
235
236
237
238
239
240
241
# File 'lib/kotoshu/algorithms/permutations.rb', line 231

def doubletwochars(word)
  return if word.length < 5

  (2...word.length).each do |i|
    # Check if word[i-2] == word[i] and word[i-3] == word[i-1]
    # Example: vacacation -> "ca" at positions 3-4, so "vac" at 2-4
    if word[i - 2] == word[i] && word[i - 3] == word[i - 1]
      yield word[0...(i - 1)] + word[(i + 1)..]
    end
  end
end

.extrachar(word) {|String| ... } ⇒ Object

Produces permutations with one char removed in all possible positions.

Parameters:

  • word (String)

    The word to process

Yields:

  • (String)

    Each variant with one char removed



156
157
158
159
160
161
162
# File 'lib/kotoshu/algorithms/permutations.rb', line 156

def extrachar(word)
  return if word.length < 2

  word.length.times do |i|
    yield word[0...i] + word[(i + 1)..]
  end
end

.forgotchar(word, trystring) {|String| ... } ⇒ Object

Produces permutations with one char inserted in all possible positions.

List of chars is taken from TRY string – if absent, tries nothing. Chars are expected to be sorted in order of usage in language.

Parameters:

  • word (String)

    The word to process

  • trystring (String)

    Characters to try inserting (from aff TRY directive)

Yields:

  • (String)

    Each variant with one char inserted



172
173
174
175
176
177
178
179
180
# File 'lib/kotoshu/algorithms/permutations.rb', line 172

def forgotchar(word, trystring)
  return if trystring.nil? || trystring.empty?

  trystring.each_char do |c|
    (0..word.length).each do |i|
      yield word[0...i] + c + word[i..]
    end
  end
end

.longswapchar(word) {|String| ... } ⇒ Object

Produces permutations with non-adjacent chars swapped (up to 4 chars distance).

Parameters:

  • word (String)

    The word to process

Yields:

  • (String)

    Each long swap variant



103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/kotoshu/algorithms/permutations.rb', line 103

def longswapchar(word)
  chars = word.chars
  (0...chars.length - 2).each do |first|
    ((first + 2)...[first + MAX_CHAR_DISTANCE, chars.length].min).each do |second|
      swapped = chars[0...first] +
               [chars[second]] +
               chars[(first + 1)...second] +
               [chars[first]] +
               chars[(second + 1)..]
      yield swapped.join
    end
  end
end

.mapchars(word, maptable) {|String| ... } ⇒ Object

Uses MAP table (sets of potentially similar chars) and tries to replace them recursively.

Example: Assuming MAP has entry “aáã”, and we have misspelling “anarchia”:

mapchars will produce: "ánarchia", "ánárchia", "ánárchiá", etc.

Examples:

Kotoshu::Algorithms::Permutations.mapchars("anarchia", [Set.new(['a', 'á', 'ã'])]) do |variant|
  puts variant
end

Parameters:

  • word (String)

    The word to process

  • maptable (Array<Set<String>>)

    Array of character sets for mapping

Yields:

  • (String)

    Each variant with mapped characters



68
69
70
71
72
# File 'lib/kotoshu/algorithms/permutations.rb', line 68

def mapchars(word, maptable)
  return if word.length < 2 || maptable.nil? || maptable.empty?

  mapchars_internal(word, 0, maptable) { |variant| yield variant }
end

.movechar(word) {|String| ... } ⇒ Object

Produces permutations with one character moved by 2, 3 or 4 places forward or backward (not 1, because adjacent swaps are already handled by swapchar).

Parameters:

  • word (String)

    The word to process

Yields:

  • (String)

    Each variant with moved character



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'lib/kotoshu/algorithms/permutations.rb', line 187

def movechar(word)
  return if word.length < 2

  chars = word.chars

  # Move characters forward
  chars.each_with_index do |char, frompos|
    ((frompos + 3)...[chars.length, frompos + MAX_CHAR_DISTANCE + 1].min).each do |topos|
      moved = chars[0...frompos] + chars[(frompos + 1)...topos] + [char] + chars[topos..]
      yield moved.join
    end
  end

  # Move characters backward
  (chars.length - 1).downto(0) do |frompos|
    [[0, frompos - MAX_CHAR_DISTANCE + 1].max, frompos - 1].min.downto(0) do |topos|
      moved = chars[0...topos] + [chars[frompos]] + chars[topos...frompos] + chars[(frompos + 1)..]
      yield moved.join
    end
  end
end

.replchars(word, reptable) {|String, Array<String>| ... } ⇒ Object

Uses REP table (typical misspellings) to replace patterns in word.

If the pattern’s replacement contains “_”, it means replacing to “ ” and yielding two different hypotheses:

1. It was one (dictionary) word "foo bar" (checked as such)
2. It was words ["foo", "bar"] (checked separately)

Examples:

Kotoshu::Algorithms::Permutations.replchars("acces", [{regexp: /ac/, replacement: "ex"}]) do |sug|
  puts sug
end

Parameters:

  • word (String)

    The word to process

  • reptable (Array<Hash>)

    Array of replacement pattern hashes with :regexp and :replacement

Yields:

  • (String, Array<String>)

    Each suggestion (string or array of words)



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/kotoshu/algorithms/permutations.rb', line 33

def replchars(word, reptable)
  return if word.length < 2 || reptable.nil? || reptable.empty?

  reptable.each do |pattern|
    str = word.to_s
    pos = 0

    while (match_data = pattern[:regexp].match(str, pos))
      suggestion = str[0...match_data.begin(0)] +
                   pattern[:replacement].gsub('_', ' ') +
                   str[match_data.end(0)..]

      yield suggestion
      yield suggestion.split(' ', 2) if suggestion.include?(' ')

      # Move past this match to find next occurrence
      pos = match_data.end(0)
      break if pos >= str.length
    end
  end
end

.swapchar(word) {|String| ... } ⇒ Object

Produces permutations with adjacent chars swapped.

For short (4 or 5 letters) words also produces double swaps: ahev -> have

Parameters:

  • word (String)

    The word to process

Yields:

  • (String)

    Each swap variant



80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/kotoshu/algorithms/permutations.rb', line 80

def swapchar(word)
  return if word.length < 2

  chars = word.chars
  (0...chars.length - 1).each do |i|
    swapped = chars[0...i] + [chars[i + 1], chars[i]] + chars[(i + 2)..]
    yield swapped.join
  end

  # Try double swaps for short words
  # ahev -> have, owudl -> would
  if [4, 5].include?(word.length)
    yield word[1] + word[0] + (word.length == 5 ? word[2] : '') + word[-1] + word[-2]
    if word.length == 5
      yield word[0] + word[2] + word[1] + word[-1] + word[-2]
    end
  end
end

.twowords(word) {|Array<String>| ... } ⇒ Object

Produces permutations of splitting word into two in all possible positions.

Parameters:

  • word (String)

    The word to process

Yields:

  • (Array<String>)

    Each two-word split



247
248
249
250
251
# File 'lib/kotoshu/algorithms/permutations.rb', line 247

def twowords(word)
  (1...word.length).each do |i|
    yield [word[0...i], word[i..]]
  end
end