Module: TextMetrics::Levenshtein
- Defined in:
- lib/text_metrics/levenshtein.rb
Overview
Levenshtein edit distance between two strings, plus a normalized similarity score. Comparison is a property of two texts, so it lives here rather than on a single-text analyzer.
Class Method Summary collapse
-
.distance(first, second) ⇒ Object
Raw edit distance: the number of single-character insertions, deletions or substitutions needed to turn
firstintosecond. -
.similarity(first, second) ⇒ Object
Similarity as a 0–100 score: 100.0 means identical, 0.0 means nothing in common.
Class Method Details
.distance(first, second) ⇒ Object
Raw edit distance: the number of single-character insertions, deletions or substitutions needed to turn first into second. Case-sensitive.
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/text_metrics/levenshtein.rb', line 11 def distance(first, second) first = first.to_s second = second.to_s m = first.length n = second.length return n if m.zero? return m if n.zero? matrix = Array.new(m + 1) { Array.new(n + 1, 0) } (0..m).each { |i| matrix[i][0] = i } (0..n).each { |j| matrix[0][j] = j } (1..m).each do |i| (1..n).each do |j| cost = (first[i - 1] == second[j - 1]) ? 0 : 1 matrix[i][j] = [ matrix[i - 1][j] + 1, # deletion matrix[i][j - 1] + 1, # insertion matrix[i - 1][j - 1] + cost # substitution ].min end end matrix[m][n] end |
.similarity(first, second) ⇒ Object
Similarity as a 0–100 score: 100.0 means identical, 0.0 means nothing in common.
39 40 41 42 43 44 |
# File 'lib/text_metrics/levenshtein.rb', line 39 def similarity(first, second) max_length = [first.to_s.length, second.to_s.length].max return 100.0 if max_length.zero? ((max_length - distance(first, second)).to_f / max_length * 100).round(2) end |