Class: Rubino::Compression::DiffCompressor
- Inherits:
-
Object
- Object
- Rubino::Compression::DiffCompressor
- Defined in:
- lib/rubino/compression/diff_compressor.rb
Overview
Deterministic, ML-free compression of a UNIFIED DIFF for the MODEL-FACING channel only. A diff is the “show me the diff” channel: the human sees the full coloured diff in the tool ‘:body` (scrollback) — that is rendered SEPARATELY and is NEVER touched here. This compressor only ever runs on the model’s ‘:output` at the ToolExecutor seam, behind the saving guard, so the common small “show me” diff flows through BYTE-IDENTICAL.
The FIDELITY INVARIANT is the whole contract: every ADDED (‘+`) line, every REMOVED (`-`) line, every file header (`diff –git`, `index`, `— a/`, `+++ b/`, rename/mode/binary lines) and every hunk header (`@@`) SURVIVES. Compression only ever drops PURE-CONTEXT (unchanged ` `) lines that sit far from a change, and collapses a whole generated/lock file’s hunk bodies to a one-line summary (its headers still survive). The eval and the spec assert this directly.
Two reductions, both lossless on signal:
1. Context trimming — collapse runs of unchanged context to ±N lines
around each change; the dropped middle becomes a `… N unchanged lines`
marker. A diff already at tight context yields ~nothing → passthrough.
2. Generated/lock-file elision — a changed file matching a generated/lock
pattern (`*.lock`, `package-lock.json`, `dist/`, `*.min.js`, …)
collapses to `path: +X/-Y lines, N hunks — elided (generated)`.
Saving guard: only applied when the diff is ≥ ‘min_lines` AND the result is ≥ `min_saving` smaller; otherwise a byte-identical passthrough. Pure regex + line walking, no AST, no gem.
Defined Under Namespace
Classes: Config
Constant Summary collapse
- DEFAULT_GENERATED =
Generated/lock-file patterns whose changed-file body collapses to a one-line summary. Config-driven (overridable); this is the default set.
%w[ *.lock Gemfile.lock package-lock.json yarn.lock pnpm-lock.yaml composer.lock *.min.js *.min.css dist/ build/ *.snap vendor/ ].freeze
- GIT_HEADER_RE =
A ‘diff –git a/<path> b/<path>` header — the path we test against the generated-file patterns. Fall back to `+++ b/<path>` when the git line is absent (a plain `diff -u` with no `diff –git`).
%r{\Adiff --git a/(?<a>\S+) b/(?<b>\S+)}- PLUS_HEADER_RE =
%r{\A\+\+\+ b/(?<path>\S+)}- MINUS_HEADER_RE =
%r{\A--- a/(?<path>\S+)}- HUNK_RE =
/\A@@ -\d+(?:,\d+)? \+\d+(?:,\d+)? @@/
Instance Method Summary collapse
-
#compress(text) ⇒ Object
Returns a CompressionResult.
-
#initialize(config) ⇒ DiffCompressor
constructor
A new instance of DiffCompressor.
Constructor Details
#initialize(config) ⇒ DiffCompressor
Returns a new instance of DiffCompressor.
58 59 60 |
# File 'lib/rubino/compression/diff_compressor.rb', line 58 def initialize(config) @cfg = config.is_a?(Config) ? config : Config.from(config) end |
Instance Method Details
#compress(text) ⇒ Object
Returns a CompressionResult. applied? == false ⇒ “send the original”.
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/rubino/compression/diff_compressor.rb', line 63 def compress(text) original_bytes = text.bytesize raw = text.split("\n", -1) had_trailing_nl = raw.last == "" && text.end_with?("\n") raw.pop if had_trailing_nl return CompressionResult.noop(strategy: :too_small) if raw.length < @cfg.min_lines files = parse(raw) return CompressionResult.noop(strategy: :parse_error) if files.empty? out_lines = files.flat_map { |f| render_file(f) } out = out_lines.join("\n") out += "\n" if had_trailing_nl build_result(out, original_bytes) end |