Module: Rubino::Util::Output

Defined in:
lib/rubino/util/output.rb

Overview

Smart truncation of long tool output for the scrollback preview.

Rule shape (5 head + 10 tail + marker, threshold 30) follows the pattern that emerged from surveying Codex, Gemini CLI, Roo, and Aider: tail bias because errors, exit codes, and command summaries live at the end. A head-heavy split (which would be intuitive for “show me the start”) consistently hides the part the user actually needs when something failed.

The FULL output still goes to the model and the session DB — this is only what the user sees in the live scroll. The marker tells them so they don’t think they’re missing something irrecoverable.

Constant Summary collapse

DEFAULT_MAX =
30
DEFAULT_HEAD =
5
DEFAULT_TAIL =
10
NUL =

The NUL byte (U+0000) is the one control char that is VALID UTF-8 yet still breaks the persistence layer: the SQLite3 driver treats it as a C-string terminator and raises “unrecognized token” (the tool row never persists), and JSON re-tags the value as BINARY. String#scrub leaves it alone (it only repairs INVALID bytes), so scrub-to-UTF-8 is necessary but not sufficient — NUL has to go too.

"\x00"
ESC =

ESC (0x1B): the introducer for ALL the dangerous sequences — CSI (cursor move, screen clear, scroll region), OSC (set window title, hyperlinks, clipboard write), DCS, etc.

"\e"
C1_RANGE =

U+009B is the single-byte CSI introducer: a terminal treats it exactly like ‘ESC [`, so stripping ESC alone would leave a working injection vector. It only exists AFTER UTF-8 decoding (the byte 0x9B on its own is invalid UTF-8 and scrubbed; U+0085/U+0080–U+009F arrive via valid 2-byte forms), so we strip the C1 block on the decoded string.

"€-Ÿ"
SGR_RE =

SGR colour/style escapes (‘e[…m`) — the ONE escape class that is SAFE to keep through the sanitizer: it changes only colour/weight and cannot move the cursor, clear the screen, set the title, or write the clipboard. Matched so #sanitize_terminal_keep_sgr can preserve rubino’s OWN styling (e.g. the colored /agents status glyph) while still neutralizing every dangerous control byte.

/\e\[[0-9;]*m/

Class Method Summary collapse

Class Method Details

.caret(byte) ⇒ Object

Visible, unambiguous stand-in for a stripped control byte: ESC → “^[”, NUL → “^@”, DEL → “^?” — the classic ‘cat -v` caret notation, so the user can tell exactly what the tool tried to emit.



145
146
147
148
149
150
# File 'lib/rubino/util/output.rb', line 145

def self.caret(byte)
  code = byte.ord
  return "^?" if code == 0x7F

  "^#{(code ^ 0x40).chr}"
end

.clean_slice(bytes, encoding) ⇒ Object

Encoding-scrub + NUL-strip a BOUNDED byteslice (#373). The head/tail byte path slices BEFORE scrubbing (so the 128MB buffer is never scrubbed whole); each kept slice still has to be cleaned exactly like scrub_utf8 (invalid bytes dropped, NUL deleted) so JSON/SQLite don’t choke.



313
314
315
316
317
# File 'lib/rubino/util/output.rb', line 313

def self.clean_slice(bytes, encoding)
  s = bytes.to_s.force_encoding(encoding).scrub("")
  s = s.encode(Encoding::UTF_8) unless s.encoding == Encoding::UTF_8
  s.include?(NUL) ? s.delete(NUL) : s
end

.elide(text, max) ⇒ String

Single-line elision to max characters with a trailing ellipsis. Shared by the parent-note tools (AnswerChild/Task/Steer) that all carried a byte-identical private ‘truncate`. Pure function.

Parameters:

  • text (#to_s)

    the raw text (nil becomes “”)

  • max (Integer)

    character budget before eliding

Returns:

  • (String)

    the text, or its first max chars + “…”



241
242
243
244
# File 'lib/rubino/util/output.rb', line 241

def self.elide(text, max)
  s = text.to_s
  s.length > max ? "#{s[0, max]}" : s
end

.first_line(text, max) ⇒ Object

First NON-BLANK line, elided to max chars (max-1 + “…”). The single source for the subagent card and view rows, which carried a byte-identical private copy. Distinct from #elide (which keeps max chars before the ellipsis) — this row shape budgets the ellipsis IN.



259
260
261
262
# File 'lib/rubino/util/output.rb', line 259

def self.first_line(text, max)
  first = first_nonblank_line(text)
  first.length > max ? "#{first[0, max - 1]}" : first
end

.first_nonblank_line(text) ⇒ Object

First NON-BLANK line of text, stripped (or “” when all-blank). A multi-line ruby/shell command often starts with a blank line, so a naive ‘.lines.first` rendered an empty approval/activity hint (#141). Pure function shared by the subagent card / view rows and the task tool’s approval preview, which each carried this extraction inline.



251
252
253
# File 'lib/rubino/util/output.rb', line 251

def self.first_nonblank_line(text)
  text.to_s.each_line.map(&:strip).find { |l| !l.empty? }.to_s
end

.head_lines(str, keep) ⇒ Object

First keep chomp’d lines of str, without materializing the whole buffer into a lines array (#373). Stops scanning after keep lines.



192
193
194
195
196
197
198
199
# File 'lib/rubino/util/output.rb', line 192

def self.head_lines(str, keep)
  out = []
  str.each_line do |line|
    out << line.chomp
    break if out.size >= keep
  end
  out
end

.line_count(str) ⇒ Object

Line count of str via a single allocation-free newline-BYTE count (#373): newlines, +1 for a final line with no trailing newline. Used by both #preview and #truncate to decide over/under cap WITHOUT splitting a potentially huge buffer into a ‘.lines` array. Counts on the byte view (`b`) so a raw, not-yet-scrubbed buffer (invalid UTF-8 / binary tool output) doesn’t raise “invalid byte sequence” — the ‘n` byte (0x0A) is unambiguous regardless of encoding, and `.b` shares the buffer (no copy).



208
209
210
211
212
213
# File 'lib/rubino/util/output.rb', line 208

def self.line_count(str)
  return 0 if str.empty?

  bytes = str.b
  bytes.count("\n") + (bytes.end_with?("\n") ? 0 : 1)
end

.preview(text, max: DEFAULT_MAX, head: DEFAULT_HEAD, tail: DEFAULT_TAIL) ⇒ String

Returns either the full text (when total lines <= max) or a head + marker + tail preview. Pure function — no side effects, no IO. Caller decides where to render the result.

Parameters:

  • text (String)

    the raw output

  • max (Integer) (defaults to: DEFAULT_MAX)

    line count above which we trim

  • head (Integer) (defaults to: DEFAULT_HEAD)

    lines to keep from the top

  • tail (Integer) (defaults to: DEFAULT_TAIL)

    lines to keep from the bottom

Returns:

  • (String)

    the preview (always a String, never nil)



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/rubino/util/output.rb', line 161

def self.preview(text, max: DEFAULT_MAX, head: DEFAULT_HEAD, tail: DEFAULT_TAIL)
  return "" if text.nil? || text.to_s.empty?

  s = text.to_s
  # Count newlines instead of materializing `s.lines` (#373): a ~1KB
  # value with a 2-million-element single-line buffer used to allocate a
  # 2M-element array (+ another 2M chomp'd copy via `.map(&:chomp)`) just
  # to learn it fits — ~hundreds of MB of churn for a preview the caller
  # may not even trim. `count("\n")` is O(n) bytes with zero allocation.
  # total line count = newline count (+1 unless the buffer ends in \n).
  total = line_count(s)
  if total <= max
    # Fits: only NOW materialize, and only to chomp the trailing newlines
    # of the (already small) line set.
    return s.lines.map(&:chomp).join("\n")
  end

  # Trimming: we only need the FIRST `head` and LAST `tail` lines, so
  # take them off the head/tail SLICES of the buffer rather than splitting
  # the whole thing into a (potentially huge) lines array. each_line with
  # a bounded take avoids walking past what we keep on the head side.
  head_pt = head_lines(s, head)
  tail_pt = tail_lines(s, tail)
  omitted = total - head_pt.size - tail_pt.size
  marker  = "… [#{omitted} more lines · full in DB] …"

  (head_pt + [marker] + tail_pt).join("\n")
end

.sanitize_terminal(text) ⇒ Object

Neutralizes terminal-control bytes in UNTRUSTED tool output before it is printed to a real terminal.

Threat (CWE-150): raw ‘e[2J` (clear screen), `e[41m…e[0m` (color), `e]0;…a` (set title), `e]52;…` (clipboard write) embedded in shell/file/MCP output reach the emulator and EXECUTE — the live tool tail printed it verbatim. Following git’s ‘core.fsmonitor`-style and dgl.cx’s “sanitize at the render chokepoint” guidance, we strip every control byte that can move the cursor, repaint, or drive the terminal, and render what we removed as visible caret/<XX> notation so the user SEES that bytes were there (silent deletion hides the attack).

Kept: t (0x09) and n (0x0A) — legitimate layout. r is normalized to n (a bare CR rewinds the line and lets later text overwrite what was already shown — another spoofing vector). Stripped: C0 0x00–0x1F (except t/n), DEL 0x7F, ESC 0x1B, and the C1 block 0x80–0x9F.

rubino’s OWN styling (the @pastel.dim/green wrapper applied AROUND this content) is a separate, trusted path and is never passed through here. Pure.



96
97
98
99
100
101
102
103
104
105
106
# File 'lib/rubino/util/output.rb', line 96

def self.sanitize_terminal(text)
  # Encoding-scrub ONLY (keep NUL et al.) so the C0 pass below can turn
  # every control byte into visible caret notation — silent deletion
  # would hide that the tool tried to emit them.
  s = scrub_encoding(text)
  # Bare CR (not part of CRLF) → newline, so overwrite-spoofing can't
  # rewind the rendered line. CRLF collapses to a single LF.
  s = s.gsub(/\r\n?/, "\n")
  s = s.gsub(/[\x00-\x08\x0B-\x1F\x7F]/) { |c| caret(c) }
  s.gsub(/[#{C1_RANGE}]/o) { |c| "<#{format("%02X", c.ord)}>" }
end

.sanitize_terminal_keep_sgr(text) ⇒ Object

Like #sanitize_terminal, but PRESERVES SGR colour escapes.

Some sinks interpolate TRUSTED rubino styling (a pastel-colored cell, e.g. the /agents table’s “● approval” status) THROUGH the same cell sanitizer that guards untrusted text. Plain #sanitize_terminal rendered those SGR bytes as visible caret notation (‘^[[33m●^[[0m approval`) —the FRICTION-3 leak. Keep the (inert) SGR sequences, neutralize everything else exactly as #sanitize_terminal does, so colour survives but `e[2J` / `e]0;…` / cursor moves still can’t reach the terminal. Callers that measure width must strip SGR first (see SGR_RE / the display-width helpers) since SGR occupies zero columns. Pure.



127
128
129
130
131
132
133
134
135
136
137
138
139
140
# File 'lib/rubino/util/output.rb', line 127

def self.sanitize_terminal_keep_sgr(text)
  s = scrub_encoding(text)
  # Carve out the SGR runs, sanitize the gaps, splice the SGR back in.
  parts = []
  last  = 0
  s.to_enum(:scan, SGR_RE).each do
    m = Regexp.last_match
    parts << sanitize_terminal(s[last...m.begin(0)])
    parts << m[0]
    last = m.end(0)
  end
  parts << sanitize_terminal(s[last..]) if last < s.length
  parts.join
end

.scrub_encoding(text) ⇒ Object

Encoding-only repair: returns a valid-UTF-8 string, leaving control bytes (incl. NUL) in place. Split out from #scrub_utf8 because the two consumers want different things downstream of “make it valid UTF-8”: the PERSIST seam (#scrub_utf8) deletes NUL outright (SQLite-fatal), but the TERMINAL render seam (#sanitize_terminal) wants every control byte turned into VISIBLE caret notation — so it scrubs encoding here, then does its own C0/C1 pass instead of pre-deleting NUL. Pure.



58
59
60
61
62
63
# File 'lib/rubino/util/output.rb', line 58

def self.scrub_encoding(text)
  s = text.to_s
  return s if s.encoding == Encoding::UTF_8 && s.valid_encoding?

  s.dup.force_encoding(Encoding::UTF_8).scrub
end

.scrub_utf8(text) ⇒ Object

Coerces text to a clean, persistable UTF-8 string: valid encoding AND free of NUL bytes.

Tool output is captured raw from a subprocess pipe / file read / MCP response and can be binary or latin-1 (‘head -c 1500 /dev/urandom`, `cat some.png`). Such bytes are tagged UTF-8 (the pipe’s external encoding) but are NOT valid UTF-8, so the moment they reach JSON.generate (the LLM request, the run-event store) or the SQLite driver they raise “source sequence is illegal/malformed utf-8” / “UTF-8 passed as BINARY” / “unrecognized token” and the tool row never persists — the model loses the record on –resume. Random binary ALSO carries NUL bytes, which survive String#scrub (NUL is valid UTF-8) yet still wedge SQLite, so we strip them here too. Cleaning at the CAPTURE seam (before the bytes are ever copied into the result) means every downstream consumer sees a safe string. Idempotent on already-clean input. Pure.



46
47
48
49
# File 'lib/rubino/util/output.rb', line 46

def self.scrub_utf8(text)
  s = scrub_encoding(text)
  s.include?(NUL) ? s.delete(NUL) : s
end

.tail_bias_bytes(text, max_bytes, spill_path = nil) ⇒ Object



319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
# File 'lib/rubino/util/output.rb', line 319

def self.tail_bias_bytes(text, max_bytes, spill_path = nil)
  encoding        = text.encoding
  recover         = spill_path ? " · full output saved to #{spill_path} — read it with offset/limit" : ""
  marker_template = "\n... [%d bytes elided#{recover} · use grep/head to narrow] ...\n"
  marker_max      = (marker_template % 999_999_999).bytesize
  head_budget     = (max_bytes * 0.1).to_i
  tail_budget     = max_bytes - head_budget - marker_max

  # Below ~200 bytes the marker eats the entire budget, so fall back
  # to a simple head truncation (old behavior). Realistic caps go
  # through the head+tail path.
  if tail_budget <= 0
    truncated = clean_slice(text.byteslice(0, max_bytes), encoding)
    tail_note = spill_path ? " · full output: #{spill_path}" : ""
    return "#{truncated}\n... [truncated at #{max_bytes} bytes#{tail_note}]"
  end

  head   = clean_slice(text.byteslice(0, head_budget), encoding)
  tail   = clean_slice(text.byteslice(-tail_budget, tail_budget), encoding)
  elided = text.bytesize - head.bytesize - tail.bytesize
  "#{head}#{format(marker_template, elided)}#{tail}"
end

.tail_bias_lines(text, max_lines, spill_path = nil) ⇒ Object



342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
# File 'lib/rubino/util/output.rb', line 342

def self.tail_bias_lines(text, max_lines, spill_path = nil)
  lines = text.lines
  return text if lines.size <= max_lines

  recover    = spill_path ? " · full output saved to #{spill_path} — read it with offset/limit" : ""
  head_count = [max_lines / 10, 5].max
  tail_count = max_lines - head_count - 1
  # Vanishing budget falls back to head-only truncation.
  if tail_count <= 0
    tail_note = spill_path ? " · full output: #{spill_path}" : ""
    return "#{lines.first(max_lines).join}\n... [truncated at #{max_lines} lines#{tail_note}]"
  end

  elided = lines.size - head_count - tail_count
  head   = lines.first(head_count).join
  tail   = lines.last(tail_count).join
  "#{head}... [#{elided} lines elided#{recover} · use grep/head to narrow] ...\n#{tail}"
end

.tail_lines(str, keep) ⇒ Object

Last keep chomp’d lines of str, found by scanning backward from the end rather than splitting the whole buffer (#373). Slices a bounded tail of the string by locating the keep-th-from-last newline.



218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
# File 'lib/rubino/util/output.rb', line 218

def self.tail_lines(str, keep)
  return [] if keep <= 0

  idx = str.length
  keep.times do
    nl = str.rindex("\n", idx - 1)
    break if nl.nil?

    idx = nl
  end
  # idx now sits ON the newline before the kept tail (or 0 if we ran out).
  slice = str[idx, str.length - idx]
  slice = slice[1..] if slice.start_with?("\n")
  slice.to_s.lines.map(&:chomp)
end

.truncate(text, max_bytes:, max_lines:, spill: nil) ⇒ Object

Truncates long tool output to stay within byte/line limits, with tail-bias because the part the agent (and a human reading the log) actually need is at the end: exit-code suffix, error message, backtrace, “X failures” line. Head-only truncation drops exactly the bytes that matter when something blows up at byte 49,999.

Shape: keep ~10% head + bulk of the budget in the tail + a marker in the middle saying how many bytes/lines were elided. Mirrors the pattern #preview already uses for the scrollback body.

When spill is supplied it is called with the full pre-truncation text and must return a path (or nil); the marker then points the model at it, so the elided middle isn’t lost — the model can ‘read` the file with offset/limit to recover any part. (Claude-Code-style spill.) Pure aside from that injected callback.



279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
# File 'lib/rubino/util/output.rb', line 279

def self.truncate(text, max_bytes:, max_lines:, spill: nil)
  text = text.to_s
  # Bound PEAK cost BEFORE any whole-buffer work (#373). A 128MB tool
  # output used to be scrubbed in full (a 128MB copy), then walked twice
  # by `text.lines` (each a multi-million-element array) just to decide it
  # was over-cap. Decide over/under with allocation-free passes —
  # `bytesize` and `count("\n")` — and only ever scrub/slice a BOUNDED
  # head+tail, never the full buffer. The model-facing cap + spill below
  # are unchanged; this only stops the materialization blow-up.
  over_bytes = text.bytesize > max_bytes
  over_lines = line_count(text) > max_lines

  # Under both caps: scrub the (already small) buffer and return. A stray
  # non-UTF-8 byte (printf '\xe9') OR a NUL (random binary) in SUB-cap
  # output must still be cleaned, or it crashes JSON.generate / the SQLite
  # driver and the tool row never persists (lost on --resume).
  return scrub_utf8(text) unless over_bytes || over_lines

  # Over cap: spill the FULL (raw) output first so nothing is lost, then
  # shape from bounded head/tail slices. Each slice path scrubs only the
  # bytes it keeps, so the 128MB buffer is never scrubbed whole.
  spill_path = spill&.call(text)
  text = tail_bias_bytes(text, max_bytes, spill_path) if over_bytes
  # Re-derive the line check on whatever survived the byte pass (the byte
  # pass already cut to ~max_bytes, so this is now a bounded count).
  text = scrub_utf8(text) unless over_bytes
  text = tail_bias_lines(text, max_lines, spill_path) if line_count(text) > max_lines
  text
end