Module: Textus::KeyDistance

Defined in:
lib/textus/key_distance.rb

Overview

Small utilities for ranking key suggestions. Bounded inputs only —Levenshtein is O(n*m) so we refuse to compute on long strings.

Constant Summary collapse

MAX_LEN =
200

Class Method Summary collapse

Class Method Details

.levenshtein(left, right) ⇒ Object

Classic iterative Levenshtein with two rows. Bounded to MAX_LEN.



18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/textus/key_distance.rb', line 18

def self.levenshtein(left, right)
  return nil if left.length > MAX_LEN || right.length > MAX_LEN
  return right.length if left.empty?
  return left.length if right.empty?

  prev = (0..right.length).to_a
  curr = Array.new(right.length + 1, 0)
  (1..left.length).each do |i|
    curr[0] = i
    (1..right.length).each do |j|
      cost = left[i - 1] == right[j - 1] ? 0 : 1
      curr[j] = [
        curr[j - 1] + 1,      # insertion
        prev[j] + 1,          # deletion
        prev[j - 1] + cost,   # substitution
      ].min
    end
    prev, curr = curr, prev
  end
  prev[right.length]
end

.shared_prefix_segments(left, right) ⇒ Object

Length of the shared dot-separated prefix between two dotted keys.



8
9
10
11
12
13
14
15
# File 'lib/textus/key_distance.rb', line 8

def self.shared_prefix_segments(left, right)
  asegs = left.split(".")
  bsegs = right.split(".")
  n = [asegs.length, bsegs.length].min
  i = 0
  i += 1 while i < n && asegs[i] == bsegs[i]
  i
end

.suggest(requested, candidates, limit: 5) ⇒ Object

Rank candidate keys against requested. Returns up to ‘limit` keys. Sort: longer shared prefix first; then smaller Levenshtein distance.



42
43
44
45
46
47
48
49
50
51
# File 'lib/textus/key_distance.rb', line 42

def self.suggest(requested, candidates, limit: 5)
  return [] if requested.nil? || requested.empty?

  scored = candidates.first(200).map do |k|
    prefix = shared_prefix_segments(requested, k)
    dist = levenshtein(requested, k) || Float::INFINITY
    [k, prefix, dist]
  end
  scored.sort_by { |(_, prefix, dist)| [-prefix, dist] }.first(limit).map(&:first)
end