Class: Rubino::Compression::JsonCompressor
- Inherits:
-
Object
- Object
- Rubino::Compression::JsonCompressor
- Defined in:
- lib/rubino/compression/json_compressor.rb
Overview
Deterministic, ML-free compression of a whole-document JSON tool output (a ‘curl | jq`, `kubectl get -o json`, `gh api`, `docker inspect`, `aws –output json` dump, or an MCP/custom-tool JSON result). Modelled on headroom’s “SmartCrusher”: a LOSSLESS schema-fold first, a LOSSY row selection only as a fallback, and a sentinel marking what was dropped.
The high-ROI case (headroom’s ~90%) is an ARRAY OF UNIFORM OBJECTS — the same keys repeated on every element. JSON spends most of its bytes on those repeated key names; we emit the keys ONCE as a header line and one compact ‘val | val | …` row per item. That fold is LOSSLESS per item.
Stages, in order (mirrors crusher.rs):
1. parse — JSON.parse the (stripped) text. Not a JSON array/object ⇒
:not_json signal, the router falls through (this is NOT our content).
2. size gate — below `min_items` array elements / `min_lines` text lines
there is nothing worth the pointer indirection ⇒ passthrough.
3. ARRAY of mostly-uniform objects:
a. LOSSLESS schema-fold (header + compact rows). Ship if ≥ min_saving.
b. LOSSY fallback (only if the fold didn't save enough AND the array
is large): keep MUST-KEEP rows — error/exception-bearing items
(fidelity: errors always survive), statistical outliers (a numeric
field > outlier_sigma σ from the mean), and the first+last item
(boundary). Dropped rows collapse to one `{"_elided": N}` sentinel.
(headroom's query-anchors are skipped — there is no query here.)
4. SINGLE large object: keep every key + the whole structure; elide only
very large STRING values (> max_string_chars) behind a short
`"<elided N chars>"` placeholder. Never drops keys.
5. saving guard — only apply when the result is ≥ min_saving smaller;
else byte-identical passthrough. Small JSON the model wants verbatim
stays untouched.
Any parse/strategy error ⇒ noop. Compression must never break a tool call.
Defined Under Namespace
Classes: Config
Constant Summary collapse
- ERROR_KEYS =
Keys whose presence (or whose value, when stringy) marks an item as error-bearing — such items always survive the lossy fallback.
%w[error errors exception err fault failure failures].freeze
- ERROR_MARKERS =
Value substrings (case-insensitive) that also mark an item as error-bearing, scanned across the item’s stringified scalar fields.
/\b(?:error|exception|fail(?:ed|ure)?|fatal|panic|traceback)\b/i
Instance Method Summary collapse
-
#compress(text) ⇒ Object
Returns a CompressionResult.
-
#initialize(config) ⇒ JsonCompressor
constructor
A new instance of JsonCompressor.
Constructor Details
#initialize(config) ⇒ JsonCompressor
Returns a new instance of JsonCompressor.
59 60 61 |
# File 'lib/rubino/compression/json_compressor.rb', line 59 def initialize(config) @cfg = config.is_a?(Config) ? config : Config.from(config) end |
Instance Method Details
#compress(text) ⇒ Object
Returns a CompressionResult. applied? == false ⇒ “send the original”. On non-JSON content the strategy is :not_json so the router can tell a “this isn’t mine, fall through” from a “JSON but not worth it” noop.
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/rubino/compression/json_compressor.rb', line 66 def compress(text) original_bytes = text.bytesize data = parse(text) return CompressionResult.noop(strategy: :not_json) if data == :not_json out = case data when Array then compress_array(text, data) when Hash then compress_object(text, data) end return CompressionResult.noop(strategy: :too_small) if out.nil? build_result(out, original_bytes) rescue StandardError CompressionResult.noop(strategy: :parse_error) end |