Module: Textus::KeyDistance
- Defined in:
- lib/textus/key_distance.rb
Overview
Small utilities for ranking key suggestions. Bounded inputs only —Levenshtein is O(n*m) so we refuse to compute on long strings.
Constant Summary collapse
- MAX_LEN =
200
Class Method Summary collapse
-
.levenshtein(left, right) ⇒ Object
Classic iterative Levenshtein with two rows.
-
.shared_prefix_segments(left, right) ⇒ Object
Length of the shared dot-separated prefix between two dotted keys.
-
.suggest(requested, candidates, limit: 5) ⇒ Object
Rank candidate keys against requested.
Class Method Details
.levenshtein(left, right) ⇒ Object
Classic iterative Levenshtein with two rows. Bounded to MAX_LEN.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/textus/key_distance.rb', line 18 def self.levenshtein(left, right) return nil if left.length > MAX_LEN || right.length > MAX_LEN return right.length if left.empty? return left.length if right.empty? prev = (0..right.length).to_a curr = Array.new(right.length + 1, 0) (1..left.length).each do |i| curr[0] = i (1..right.length).each do |j| cost = left[i - 1] == right[j - 1] ? 0 : 1 curr[j] = [ curr[j - 1] + 1, # insertion prev[j] + 1, # deletion prev[j - 1] + cost, # substitution ].min end prev, curr = curr, prev end prev[right.length] end |
.shared_prefix_segments(left, right) ⇒ Object
Length of the shared dot-separated prefix between two dotted keys.
8 9 10 11 12 13 14 15 |
# File 'lib/textus/key_distance.rb', line 8 def self.shared_prefix_segments(left, right) asegs = left.split(".") bsegs = right.split(".") n = [asegs.length, bsegs.length].min i = 0 i += 1 while i < n && asegs[i] == bsegs[i] i end |
.suggest(requested, candidates, limit: 5) ⇒ Object
Rank candidate keys against requested. Returns up to ‘limit` keys. Sort: longer shared prefix first; then smaller Levenshtein distance.
42 43 44 45 46 47 48 49 50 51 |
# File 'lib/textus/key_distance.rb', line 42 def self.suggest(requested, candidates, limit: 5) return [] if requested.nil? || requested.empty? scored = candidates.first(200).map do |k| prefix = shared_prefix_segments(requested, k) dist = levenshtein(requested, k) || Float::INFINITY [k, prefix, dist] end scored.sort_by { |(_, prefix, dist)| [-prefix, dist] }.first(limit).map(&:first) end |