Class: Canon::Diff::TextDecomposer

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/diff/text_decomposer.rb

Overview

Decomposes two strings into their common prefix, changed portion, and common suffix. Used during DiffNode enrichment (Phase 1) to produce the 3-part decomposition: before-text, changed-text, after-text.

This is a pure function with no side effects. It operates on short serialized strings (e.g., “Hello World” vs “Hello Universe”), NOT on full document text.

Examples:

Simple substitution

TextDecomposer.decompose("Hello World", "Hello Universe")
# => { common_prefix: "Hello ", changed_old: "World",
#      changed_new: "Universe", common_suffix: "" }

Mid-string insertion

TextDecomposer.decompose("abc", "aXbc")
# => { common_prefix: "a", changed_old: "",
#      changed_new: "X", common_suffix: "bc" }

Full replacement

TextDecomposer.decompose("foo", "bar")
# => { common_prefix: "", changed_old: "foo",
#      changed_new: "bar", common_suffix: "" }

Class Method Summary collapse

Class Method Details

.decompose(text1, text2) ⇒ Hash

Decompose two strings into common prefix / changed / common suffix.

Algorithm: character-by-character prefix scan from the start, then reverse suffix scan from the end. The middle portion is the actual change. O(n) where n is the string length.

Parameters:

  • text1 (String)

    the old text (serialized_before)

  • text2 (String)

    the new text (serialized_after)

Returns:

  • (Hash)

    with keys :common_prefix, :changed_old, :changed_new, :common_suffix



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/canon/diff/text_decomposer.rb', line 37

def self.decompose(text1, text2)
  return empty_result if text1.nil? && text2.nil?

  if text2.nil?
    return { common_prefix: "", changed_old: text1.to_s,
             changed_new: "", common_suffix: "" }
  end
  if text1.nil?
    return { common_prefix: "", changed_old: "",
             changed_new: text2.to_s, common_suffix: "" }
  end

  prefix_len = find_common_prefix_length(text1, text2)
  suffix_len = find_common_suffix_length(text1, text2, prefix_len)

  {
    common_prefix: text1[0...prefix_len],
    changed_old: text1[prefix_len...(text1.length - suffix_len)],
    changed_new: text2[prefix_len...(text2.length - suffix_len)],
    common_suffix: text1[(text1.length - suffix_len)..],
  }
end