Class: Rubino::Compression::DiffCompressor

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/compression/diff_compressor.rb

Overview

Deterministic, ML-free compression of a UNIFIED DIFF for the MODEL-FACING channel only. A diff is the “show me the diff” channel: the human sees the full coloured diff in the tool ‘:body` (scrollback) — that is rendered SEPARATELY and is NEVER touched here. This compressor only ever runs on the model’s ‘:output` at the ToolExecutor seam, behind the saving guard, so the common small “show me” diff flows through BYTE-IDENTICAL.

The FIDELITY INVARIANT is the whole contract: every ADDED (‘+`) line, every REMOVED (`-`) line, every file header (`diff –git`, `index`, `— a/`, `+++ b/`, rename/mode/binary lines) and every hunk header (`@@`) SURVIVES. Compression only ever drops PURE-CONTEXT (unchanged ` `) lines that sit far from a change, and collapses a whole generated/lock file’s hunk bodies to a one-line summary (its headers still survive). The eval and the spec assert this directly.

Two reductions, both lossless on signal:

1. Context trimming — collapse runs of unchanged context to ±N lines
   around each change; the dropped middle becomes a `… N unchanged lines`
   marker. A diff already at tight context yields ~nothing → passthrough.
2. Generated/lock-file elision — a changed file matching a generated/lock
   pattern (`*.lock`, `package-lock.json`, `dist/`, `*.min.js`, …)
   collapses to `path: +X/-Y lines, N hunks — elided (generated)`.

Saving guard: only applied when the diff is ≥ ‘min_lines` AND the result is ≥ `min_saving` smaller; otherwise a byte-identical passthrough. Pure regex + line walking, no AST, no gem.

Defined Under Namespace

Classes: Config

Constant Summary collapse

DEFAULT_GENERATED =

Generated/lock-file patterns whose changed-file body collapses to a one-line summary. Config-driven (overridable); this is the default set.

%w[
  *.lock Gemfile.lock package-lock.json yarn.lock pnpm-lock.yaml
  composer.lock *.min.js *.min.css dist/ build/ *.snap vendor/
].freeze
GIT_HEADER_RE =

A ‘diff –git a/<path> b/<path>` header — the path we test against the generated-file patterns. Fall back to `+++ b/<path>` when the git line is absent (a plain `diff -u` with no `diff –git`).

%r{\Adiff --git a/(?<a>\S+) b/(?<b>\S+)}
PLUS_HEADER_RE =
%r{\A\+\+\+ b/(?<path>\S+)}
MINUS_HEADER_RE =
%r{\A--- a/(?<path>\S+)}
HUNK_RE =
/\A@@ -\d+(?:,\d+)? \+\d+(?:,\d+)? @@/

Instance Method Summary collapse

Constructor Details

#initialize(config) ⇒ DiffCompressor

Returns a new instance of DiffCompressor.



58
59
60
# File 'lib/rubino/compression/diff_compressor.rb', line 58

def initialize(config)
  @cfg = config.is_a?(Config) ? config : Config.from(config)
end

Instance Method Details

#compress(text) ⇒ Object

Returns a CompressionResult. applied? == false ⇒ “send the original”.



63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/rubino/compression/diff_compressor.rb', line 63

def compress(text)
  original_bytes = text.bytesize
  raw = text.split("\n", -1)
  had_trailing_nl = raw.last == "" && text.end_with?("\n")
  raw.pop if had_trailing_nl

  return CompressionResult.noop(strategy: :too_small) if raw.length < @cfg.min_lines

  files = parse(raw)
  return CompressionResult.noop(strategy: :parse_error) if files.empty?

  out_lines = files.flat_map { |f| render_file(f) }
  out = out_lines.join("\n")
  out += "\n" if had_trailing_nl
  build_result(out, original_bytes)
end