Module: Clacky::Utils::StringMatcher
- Defined in:
- lib/clacky/utils/string_matcher.rb
Overview
Utilities for finding and matching strings in file content. Used by the Edit tool and edit preview to apply a consistent layered matching strategy: exact → trim → unescape → smart line match.
Class Method Summary collapse
-
.find_match(content, old_string) ⇒ Hash?
Find a matching string in content using a layered strategy.
-
.generate_candidates(old_string) ⇒ Array<String>
Generate candidate strings by applying different transformations.
-
.lines_match_normalized?(lines1, lines2) ⇒ Boolean
Compare two arrays of lines after normalising leading whitespace.
-
.try_smart_match(content, old_string) ⇒ Hash?
Try smart line-by-line matching that tolerates leading whitespace differences.
-
.unescape_over_escaped(str) ⇒ String
Convert over-escaped sequences back to their real characters.
Class Method Details
.find_match(content, old_string) ⇒ Hash?
Find a matching string in content using a layered strategy.
Strategy (applied in order):
1. Exact match (original old_string)
2. Trimmed match (leading/trailing whitespace stripped)
3. Unescaped match (over-escaped sequences normalised)
4. Combined trim + unescape
5. Smart line-by-line match (tolerates indent differences)
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# File 'lib/clacky/utils/string_matcher.rb', line 22 def self.find_match(content, old_string) candidates = generate_candidates(old_string) # Simple string matching for each candidate candidates.each do |candidate| next if candidate.empty? if content.include?(candidate) return { matched_string: candidate, occurrences: content.scan(Regexp.quote(candidate)).length } end end # Fall back to smart line-by-line matching (tabs vs spaces, etc.) try_smart_match(content, old_string) end |
.generate_candidates(old_string) ⇒ Array<String>
Generate candidate strings by applying different transformations.
45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/clacky/utils/string_matcher.rb', line 45 def self.generate_candidates(old_string) trimmed = old_string.strip unescaped = unescape_over_escaped(old_string) unescaped_trimmed = unescape_over_escaped(trimmed) [ old_string, # Original trimmed, # Trim leading/trailing whitespace unescaped, # Unescape over-escaped sequences unescaped_trimmed # Combined: trim + unescape ].uniq end |
.lines_match_normalized?(lines1, lines2) ⇒ Boolean
Compare two arrays of lines after normalising leading whitespace.
124 125 126 127 128 129 130 131 132 133 |
# File 'lib/clacky/utils/string_matcher.rb', line 124 def self.lines_match_normalized?(lines1, lines2) return false unless lines1.length == lines2.length lines1.zip(lines2).all? do |line1, line2| norm1 = line1.sub(/^\s+/, " ").chomp norm2 = line2.sub(/^\s+/, " ").chomp norm1 == norm2 || norm1 == unescape_over_escaped(norm2) end end |
.try_smart_match(content, old_string) ⇒ Hash?
Try smart line-by-line matching that tolerates leading whitespace differences.
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/clacky/utils/string_matcher.rb', line 87 def self.try_smart_match(content, old_string) candidates = generate_candidates(old_string) candidates.each do |candidate| next if candidate.empty? candidate_lines = candidate.lines next if candidate_lines.empty? content_lines = content.lines matches = [] (0..content_lines.length - candidate_lines.length).each do |start_idx| slice = content_lines[start_idx, candidate_lines.length] next unless slice if lines_match_normalized?(slice, candidate_lines) matches << { start: start_idx, matched_string: slice.join } end end unless matches.empty? return { matched_string: matches.first[:matched_string], occurrences: matches.length } end end nil end |
.unescape_over_escaped(str) ⇒ String
Convert over-escaped sequences back to their real characters. This handles the common case where LLMs double-escape backslashes.
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/clacky/utils/string_matcher.rb', line 63 def self.unescape_over_escaped(str) result = str.dup # Unicode escapes: \uXXXX → actual Unicode character result = result.gsub(/\\u([0-9a-fA-F]{4})/) { [$1.hex].pack("U") } # Common escape sequences result = result.gsub('\\n', "\n") result = result.gsub('\\t', "\t") result = result.gsub('\\r', "\r") result = result.gsub('\\f', "\f") result = result.gsub('\\b', "\b") result = result.gsub('\\v', "\v") result = result.gsub('\\"', '"') result = result.gsub('\\\\', "\\") result end |