Class: Canon::Diff::DiffLineBuilder

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/diff/diff_line_builder.rb

Overview

Assembles DiffLines from enriched DiffNodes.

This is Phase 2 of the two-phase diff pipeline. It runs after DiffNodeEnricher and before DiffBlockBuilder. It does NO computation on the change content — it simply reads pre-computed DiffCharRanges from DiffNodes and assembles them into DiffLines.

The DiffLineBuilder handles:

  • Mapping DiffCharRanges to the correct DiffLines

  • Filling in unchanged context lines between changes

  • Detecting reflow (lines that moved between documents)

  • Computing line correspondence without LCS

Constant Summary collapse

REFLOW_SUMMARY_THRESHOLD =

Maximum number of reflow lines before switching to summary mode. When more lines than this are unmatched in a reflow gap, a summary line is emitted instead of listing each individual line.

2

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(diff_nodes, text1, text2) ⇒ DiffLineBuilder

Returns a new instance of DiffLineBuilder.



34
35
36
37
38
39
40
41
42
43
44
# File 'lib/canon/diff/diff_line_builder.rb', line 34

def initialize(diff_nodes, text1, text2)
  @diff_nodes = diff_nodes
  @text1 = text1
  @text2 = text2
  @lines1 = text1.split("\n")
  @lines2 = text2.split("\n")
  # Build reverse indices for efficient content lookup in gap handling.
  # Maps content string to array of line indices where that content appears.
  @line_to_indices1 = build_line_index(@lines1)
  @line_to_indices2 = build_line_index(@lines2)
end

Class Method Details

.build(diff_nodes, text1, text2) ⇒ Array<DiffLine>

Build DiffLines from enriched DiffNodes.

Parameters:

  • diff_nodes (Array<DiffNode>)

    Enriched DiffNodes with char_ranges

  • text1 (String)

    The first document (preprocessed)

  • text2 (String)

    The second document (preprocessed)

Returns:

  • (Array<DiffLine>)

    The assembled diff lines



27
28
29
30
31
32
# File 'lib/canon/diff/diff_line_builder.rb', line 27

def self.build(diff_nodes, text1, text2)
  return [] if diff_nodes.nil? || diff_nodes.empty?
  return [] if text1.nil? || text2.nil?

  new(diff_nodes, text1, text2).build
end

Instance Method Details

#buildObject



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/canon/diff/diff_line_builder.rb', line 51

def build
  # Sort DiffNodes by their position in text1 (or text2 if no text1 range)
  sorted = @diff_nodes.select do |dn|
    dn.char_ranges && !dn.char_ranges.empty?
  end
    .sort_by { |dn| sort_key(dn) }

  result = []
  cursor1 = 0  # current position in text1 lines
  cursor2 = 0  # current position in text2 lines

  sorted.each do |diff_node|
    range1 = diff_node.line_range_before
    range2 = diff_node.line_range_after

    # Determine the start positions for this change
    node_start1 = range1 ? range1[0] : cursor1
    node_start2 = range2 ? range2[0] : cursor2

    # Skip if this node's range has already been passed by the cursor.
    # Handle cases where range1 or range2 is nil (nil means position is only
    # in the other text, so we only check the non-nil side).
    cursor1_passed = range1.nil? ? false : (cursor1 > node_start1)
    cursor2_passed = range2.nil? ? false : (cursor2 > node_start2)
    if cursor1_passed || cursor2_passed
      next
    end

    # Emit unchanged lines before this change
    emit_unchanged(result, cursor1, node_start1, cursor2, node_start2)

    # Detect and handle reflow before this change
    handle_reflow(result, cursor1, node_start1, cursor2, node_start2,
                  diff_node)

    # Emit changed lines for this DiffNode
    emit_changed(result, diff_node)

    # Advance cursors past this change.
    # cursor1 advances based on text1 content consumed.
    # cursor2 advances based on text2 content consumed.
    # For pure insertions (range1 nil), cursor1 advances by count2 to
    # account for text2 gap lines that were emitted as mapping to text1.
    # For pure deletions (range2 nil), cursor2 advances by count1.
    old_cursor1 = cursor1
    old_cursor2 = cursor2
    cursor1 = if range1
                range1[1] + 1
              elsif range2
                old_cursor1 + (node_start2 - old_cursor2)
              else
                node_start1 + 1
              end
    cursor2 = if range2
                range2[1] + 1
              elsif range1
                old_cursor2 + (node_start1 - old_cursor1)
              else
                node_start2 + 1
              end
  end

  # Emit remaining unchanged lines after last change
  emit_unchanged(result, cursor1, @lines1.length, cursor2, @lines2.length)

  result
end