Class: Rubino::Compression::JsonCompressor

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/compression/json_compressor.rb

Overview

Deterministic, ML-free compression of a whole-document JSON tool output (a ‘curl | jq`, `kubectl get -o json`, `gh api`, `docker inspect`, `aws –output json` dump, or an MCP/custom-tool JSON result). Modelled on headroom’s “SmartCrusher”: a LOSSLESS schema-fold first, a LOSSY row selection only as a fallback, and a sentinel marking what was dropped.

The high-ROI case (headroom’s ~90%) is an ARRAY OF UNIFORM OBJECTS — the same keys repeated on every element. JSON spends most of its bytes on those repeated key names; we emit the keys ONCE as a header line and one compact ‘val | val | …` row per item. That fold is LOSSLESS per item.

Stages, in order (mirrors crusher.rs):

1. parse — JSON.parse the (stripped) text. Not a JSON array/object ⇒
   :not_json signal, the router falls through (this is NOT our content).
2. size gate — below `min_items` array elements / `min_lines` text lines
   there is nothing worth the pointer indirection ⇒ passthrough.
3. ARRAY of mostly-uniform objects:
     a. LOSSLESS schema-fold (header + compact rows). Ship if ≥ min_saving.
     b. LOSSY fallback (only if the fold didn't save enough AND the array
        is large): keep MUST-KEEP rows — error/exception-bearing items
        (fidelity: errors always survive), statistical outliers (a numeric
        field > outlier_sigma σ from the mean), and the first+last item
        (boundary). Dropped rows collapse to one `{"_elided": N}` sentinel.
        (headroom's query-anchors are skipped — there is no query here.)
4. SINGLE large object: keep every key + the whole structure; elide only
   very large STRING values (> max_string_chars) behind a short
   `"<elided N chars>"` placeholder. Never drops keys.
5. saving guard — only apply when the result is ≥ min_saving smaller;
   else byte-identical passthrough. Small JSON the model wants verbatim
   stays untouched.

Any parse/strategy error ⇒ noop. Compression must never break a tool call.

Defined Under Namespace

Classes: Config

Constant Summary collapse

ERROR_KEYS =

Keys whose presence (or whose value, when stringy) marks an item as error-bearing — such items always survive the lossy fallback.

%w[error errors exception err fault failure failures].freeze
ERROR_MARKERS =

Value substrings (case-insensitive) that also mark an item as error-bearing, scanned across the item’s stringified scalar fields.

/\b(?:error|exception|fail(?:ed|ure)?|fatal|panic|traceback)\b/i

Instance Method Summary collapse

Constructor Details

#initialize(config) ⇒ JsonCompressor

Returns a new instance of JsonCompressor.



59
60
61
# File 'lib/rubino/compression/json_compressor.rb', line 59

def initialize(config)
  @cfg = config.is_a?(Config) ? config : Config.from(config)
end

Instance Method Details

#compress(text) ⇒ Object

Returns a CompressionResult. applied? == false ⇒ “send the original”. On non-JSON content the strategy is :not_json so the router can tell a “this isn’t mine, fall through” from a “JSON but not worth it” noop.



66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/rubino/compression/json_compressor.rb', line 66

def compress(text)
  original_bytes = text.bytesize
  data = parse(text)
  return CompressionResult.noop(strategy: :not_json) if data == :not_json

  out =
    case data
    when Array then compress_array(text, data)
    when Hash  then compress_object(text, data)
    end
  return CompressionResult.noop(strategy: :too_small) if out.nil?

  build_result(out, original_bytes)
rescue StandardError
  CompressionResult.noop(strategy: :parse_error)
end