Module: Clacky::Utils::StringMatcher

Defined in:
lib/clacky/utils/string_matcher.rb

Overview

Utilities for finding and matching strings in file content. Used by the Edit tool and edit preview to apply a consistent layered matching strategy: exact → trim → unescape → smart line match.

Class Method Summary collapse

Class Method Details

.find_match(content, old_string) ⇒ Hash?

Find a matching string in content using a layered strategy.

Strategy (applied in order):

1. Exact match (original old_string)
2. Trimmed match (leading/trailing whitespace stripped)
3. Unescaped match (over-escaped sequences normalised)
4. Combined trim + unescape
5. Smart line-by-line match (tolerates indent differences)

Parameters:

  • content (String)

    File content to search in

  • old_string (String)

    String to locate

Returns:

  • (Hash, nil)

    { matched_string: String, occurrences: Integer } or nil when nothing matches



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/clacky/utils/string_matcher.rb', line 22

def self.find_match(content, old_string)
  candidates = generate_candidates(old_string)

  # Simple string matching for each candidate
  candidates.each do |candidate|
    next if candidate.empty?

    if content.include?(candidate)
      return {
        matched_string: candidate,
        occurrences: content.scan(Regexp.quote(candidate)).length
      }
    end
  end

  # Fall back to smart line-by-line matching (tabs vs spaces, etc.)
  try_smart_match(content, old_string)
end

.generate_candidates(old_string) ⇒ Array<String>

Generate candidate strings by applying different transformations.

Parameters:

  • old_string (String)

Returns:

  • (Array<String>)

    Unique list of candidates



45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/clacky/utils/string_matcher.rb', line 45

def self.generate_candidates(old_string)
  trimmed           = old_string.strip
  unescaped         = unescape_over_escaped(old_string)
  unescaped_trimmed = unescape_over_escaped(trimmed)

  [
    old_string,        # Original
    trimmed,           # Trim leading/trailing whitespace
    unescaped,         # Unescape over-escaped sequences
    unescaped_trimmed  # Combined: trim + unescape
  ].uniq
end

.lines_match_normalized?(lines1, lines2) ⇒ Boolean

Compare two arrays of lines after normalising leading whitespace.

Parameters:

  • lines1 (Array<String>)
  • lines2 (Array<String>)

Returns:

  • (Boolean)


124
125
126
127
128
129
130
131
132
133
# File 'lib/clacky/utils/string_matcher.rb', line 124

def self.lines_match_normalized?(lines1, lines2)
  return false unless lines1.length == lines2.length

  lines1.zip(lines2).all? do |line1, line2|
    norm1 = line1.sub(/^\s+/, " ").chomp
    norm2 = line2.sub(/^\s+/, " ").chomp

    norm1 == norm2 || norm1 == unescape_over_escaped(norm2)
  end
end

.try_smart_match(content, old_string) ⇒ Hash?

Try smart line-by-line matching that tolerates leading whitespace differences.

Parameters:

  • content (String)
  • old_string (String)

Returns:

  • (Hash, nil)


87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/clacky/utils/string_matcher.rb', line 87

def self.try_smart_match(content, old_string)
  candidates = generate_candidates(old_string)

  candidates.each do |candidate|
    next if candidate.empty?

    candidate_lines = candidate.lines
    next if candidate_lines.empty?

    content_lines = content.lines
    matches = []

    (0..content_lines.length - candidate_lines.length).each do |start_idx|
      slice = content_lines[start_idx, candidate_lines.length]
      next unless slice

      if lines_match_normalized?(slice, candidate_lines)
        matches << { start: start_idx, matched_string: slice.join }
      end
    end

    unless matches.empty?
      return {
        matched_string: matches.first[:matched_string],
        occurrences: matches.length
      }
    end
  end

  nil
end

.unescape_over_escaped(str) ⇒ String

Convert over-escaped sequences back to their real characters. This handles the common case where LLMs double-escape backslashes.

Parameters:

  • str (String)

Returns:

  • (String)


63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/clacky/utils/string_matcher.rb', line 63

def self.unescape_over_escaped(str)
  result = str.dup

  # Unicode escapes: \uXXXX → actual Unicode character
  result = result.gsub(/\\u([0-9a-fA-F]{4})/) { [$1.hex].pack("U") }

  # Common escape sequences
  result = result.gsub('\\n',  "\n")
  result = result.gsub('\\t',  "\t")
  result = result.gsub('\\r',  "\r")
  result = result.gsub('\\f',  "\f")
  result = result.gsub('\\b',  "\b")
  result = result.gsub('\\v',  "\v")
  result = result.gsub('\\"',  '"')
  result = result.gsub('\\\\', "\\")

  result
end