Class: Rubino::Compression::LogCompressor

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/compression/log_compressor.rb

Overview

Deterministic, ML-free compression of COMMAND OUTPUT (test runs, linters, build logs, long shell dumps) for the model-facing channel. Unlike the Ruby SKELETON path (whole-file reads), this is the high-ROI channel: the agent reads command output WHOLE, and the signal — failures + the summary tally —is a tiny fraction of the bytes. We keep every error/failure and the final tally VERBATIM and drop the passing-test / progress noise.

The FIDELITY INVARIANT is the whole contract: a line that names an ERROR / FAIL / Failure, and the final ‘N examples, M failures` summary, MUST survive compression. We may drop a passing dot, an INFO line, a green `describe` header — NEVER a failure descriptor. The measurement (eval/) and the spec both assert this directly.

Pure regex + counting, no AST, no gem. Small outputs (< min_lines) pass through unchanged — the marker indirection isn’t worth it.

Defined Under Namespace

Classes: Config, Line

Constant Summary collapse

ERROR_RE =

— Severity lexicon (word-boundary anchored so ‘error_count` doesn’t trip on a passing-context line and ‘passed` doesn’t read as ‘pass`). —

/\b(?:error|errors|fail|failed|failure|failures|fatal|exception|panic|assert(?:ion)?)\b/i
WARN_RE =
/\b(?:warn|warning|warnings|deprecat\w+|pending|skipped|todo)\b/i
INFO_RE =
/\b(?:info|debug|pass|passed|passing|ok|success|done|examples?)\b/i
FAILURE_SHAPE_RE =

In a STRUCTURED test runner the per-test progress section names tests (“handles an error”, “fails fast”) whose keywords are NOT failures — the real failures live in dedicated report SHAPES. These match those shapes so the 8k-line green progress section can’t masquerade as 364 failures.

rspec:   `Failure/Error:`, `  N) <desc>`, `rspec ./path:NN` rerun list
pytest:  `FAILED path::test`, `E   <assert>`, `>   assert ...`
jest:    `✕ test`, `● Component › test`
cargo:   `test name ... FAILED`, `---- name stdout ----`
rubocop  `path:line:col: C: Offense`
%r{
  \A\s*\d+\)\s                          # rspec/cargo numbered failure
  | \bFailure/Error:                    # rspec body anchor
  | \A\s*rspec\s+['"]?\.?/?\S+:\d+      # rspec rerun line
  | \A\s*FAILED\b                       # pytest / generic
  | \A\s*E\s{2,}\S                       # pytest assertion line
  | \A\s*[✕✗✘×]\s                       # jest/mocha fail mark
  | \A\s*[●•]\s.*›                  # jest failure header (› )
  | \.{3}\s*FAILED\s*\z                  # cargo `test x ... FAILED`
  | \A----\s.*\bstdout\b                 # cargo failure capture header
  | \A\S+:\d+:\d+:\s+[A-Z]:\s            # rubocop offense (path:l:c: C:)
  | \A\s*(?:error|panic)\[             # rust/compiler `error[E…]`
}x
STACK_RE =

Stack-trace / backtrace frame: rspec ‘# ./spec/…:NN`, a bare `from path:line:in`, a `path:line:in` Ruby frame, or a `at File.fn` / pytest `File “x”, line N`. Indented continuation lines of a trace.

%r{
  \A\s*(?:\#\s+)?(?:from\s+)?[^\s:]+\.\w+:\d+(?::in\b)?   # path:line[:in]
  | \A\s*at\s+\S+                                         # JS/Java at frame
  | \A\s*File\s+"[^"]+",\s+line\s+\d+                     # pytest frame
}x
SUMMARY_RE =

The final tally / framing lines that MUST survive: rspec ‘N examples, M failures`, `Finished in …`, the `Failures:` header, rubocop’s ‘NN files inspected, MM offenses`, pytest’s ‘=== N failed ===`.

/
  \b\d+\s+examples?\b
  | \bFinished\sin\b
  | \A\s*Failures:\s*\z
  | \b\d+\s+files?\s+inspected\b
  | \b\d+\s+offenses?\b
  | ^={3,}.*\b(?:failed|passed|error)\b
  | \b\d+\s+(?:passed|failed|error)\b
/xi
STRUCTURED =

STRUCTURED runners report failures in dedicated shapes; the green progress section’s keyword-bearing test names are NOT failures. Generic logs have no such structure, so keyword severity is all we have.

%i[rspec pytest jest cargo].freeze

Instance Method Summary collapse

Constructor Details

#initialize(config) ⇒ LogCompressor

Returns a new instance of LogCompressor.



99
100
101
# File 'lib/rubino/compression/log_compressor.rb', line 99

def initialize(config)
  @cfg = config.is_a?(Config) ? config : Config.from(config)
end

Instance Method Details

#compress(text) ⇒ Object

Returns a CompressionResult. applied? == false means “send the original”.



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/rubino/compression/log_compressor.rb', line 104

def compress(text)
  original_bytes = text.bytesize
  raw = text.split("\n", -1)
  # split("\n", -1) leaves a trailing "" for a final newline; drop it so
  # the line count and the rebuilt output match the input.
  raw.pop if raw.last == "" && text.end_with?("\n")

  return CompressionResult.noop(strategy: :too_small) if raw.length < @cfg.min_lines

  @format = detect_format(raw)
  lines = classify(raw)
  select!(lines)
  kept = lines.select(&:kept)

  return CompressionResult.noop(strategy: :insufficient_saving) if kept.length >= raw.length

  out = render(lines)
  build_result(out, original_bytes)
end