Module: Philiprehberger::FuzzyMatch::Lcs

Defined in:
lib/philiprehberger/fuzzy_match/lcs.rb

Overview

Longest Common Subsequence algorithm with O(min(n,m)) space

Class Method Summary collapse

Class Method Details

.length(str_a, str_b) ⇒ Integer

Returns the length of the longest common subsequence

Parameters:

  • str_a (String)
  • str_b (String)

Returns:

  • (Integer)


12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/philiprehberger/fuzzy_match/lcs.rb', line 12

def self.length(str_a, str_b)
  a = str_a.to_s.downcase
  b = str_b.to_s.downcase

  return 0 if a.empty? || b.empty?

  # Ensure a is the shorter string for O(min(n,m)) space
  a, b = b, a if a.length > b.length

  prev = Array.new(a.length + 1, 0)
  curr = Array.new(a.length + 1, 0)

  b.each_char do |cb|
    a.each_char.with_index do |ca, j|
      curr[j + 1] = if ca == cb
                      prev[j] + 1
                    else
                      [curr[j], prev[j + 1]].max
                    end
    end
    prev, curr = curr, prev
    curr.fill(0)
  end

  prev[a.length]
end

.ratio(str_a, str_b) ⇒ Float

Returns normalized LCS similarity (0.0 to 1.0)

Parameters:

  • str_a (String)
  • str_b (String)

Returns:

  • (Float)


44
45
46
47
48
49
50
51
52
53
# File 'lib/philiprehberger/fuzzy_match/lcs.rb', line 44

def self.ratio(str_a, str_b)
  a = str_a.to_s.downcase
  b = str_b.to_s.downcase
  total = a.length + b.length

  return 1.0 if total.zero?

  lcs_len = length(a, b)
  (2.0 * lcs_len) / total
end