Module: Mcpeye::Redaction

Defined in:
lib/mcpeye/redaction.rb

Overview

v1 redaction: regex-based secret/PII scrubbing applied client-side in the SDK BEFORE anything is sent to the ingest API, and again defensively in the worker before any LLM call. Self-hosting is the real privacy mitigation; this reduces the blast radius of obvious secrets/PII in free-text arguments and intent.

Deliberately conservative: it over-redacts rather than leak. Smarter, structure-aware redaction is a documented future improvement.

Ported from @mcpeye/core (redaction.ts) — keep the patterns, replacements, the depth cap, and the cycle guard in sync across SDKs (TS, Python, Ruby).

Constant Summary collapse

PATTERNS =

Each entry: [regex, replacement]. Order matches the TS port so that the most-specific keys (sk-ant-) win before the generic ones where relevant.

[
  # email
  [/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/, "[REDACTED_EMAIL]"],
  # Anthropic key (more specific than the generic sk- key — run first)
  [/\bsk-ant-[A-Za-z0-9_-]{16,}\b/, "[REDACTED_KEY]"],
  # OpenAI-style key
  [/\bsk-[A-Za-z0-9_-]{16,}\b/, "[REDACTED_KEY]"],
  # GitHub tokens (ghp_, gho_, ghu_, ghs_, ghr_)
  [/\bgh[pousr]_[A-Za-z0-9]{20,}\b/, "[REDACTED_KEY]"],
  # AWS access key id
  [/\bAKIA[0-9A-Z]{16}\b/, "[REDACTED_KEY]"],
  # Bearer tokens
  [/\bBearer\s+[A-Za-z0-9._-]{12,}\b/i, "Bearer [REDACTED_KEY]"],
  # JWT
  [/\beyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\b/, "[REDACTED_JWT]"],
  # Credit-card-ish (13-16 digit groups, optional spaces/dashes)
  [/\b(?:\d[ -]?){13,16}\b/, "[REDACTED_CARD]"],
  # Phone numbers (loose international)
  [/\b\+?\d{1,3}[\s.-]?\(?\d{2,4}\)?[\s.-]?\d{3,4}[\s.-]?\d{3,4}\b/, "[REDACTED_PHONE]"]
].freeze
DEFAULT_DENYLIST =

Exact field names whose values are always dropped, regardless of content.

%w[
  password passwd secret token apiKey api_key authorization
].freeze
REDACTED_FIELD =
"[REDACTED_FIELD]"
REDACTED_TOO_DEEP =
"[REDACTED_TOO_DEEP]"
REDACTED_CYCLE =
"[REDACTED_CYCLE]"
MAX_REDACT_DEPTH =

Max nesting walk descends. Past this we substitute a marker instead of recursing further. Without a cap, a deeply-nested value (which the ingest schema does NOT bound) overflows the call stack — and on the server that throw aborts the whole ingest transaction, 500-ing the entire batch and discarding every other valid event. 64 is far deeper than any real tool payload. Matches @mcpeye/core’s MAX_REDACT_DEPTH.

64

Class Method Summary collapse

Class Method Details

.redact_string(input) ⇒ Object

Scrub a single string through every pattern.



58
59
60
61
62
63
64
# File 'lib/mcpeye/redaction.rb', line 58

def self.redact_string(input)
  return input unless input.is_a?(String)

  out = input.dup
  PATTERNS.each { |(re, replacement)| out = out.gsub(re, replacement) }
  out
end

.redact_value(value, opts = {}) ⇒ Object

Recursively redact a JSON-ish value (Hash / Array / String / primitive).

opts — extra exact field names to drop (case-insensitive), merged with DEFAULT_DENYLIST.



70
71
72
73
74
75
76
77
78
79
80
# File 'lib/mcpeye/redaction.rb', line 70

def self.redact_value(value, opts = {})
  extra = opts[:denylist_fields] || opts["denylist_fields"] || []
  denylist = (DEFAULT_DENYLIST + extra).map { |f| f.to_s.downcase }.to_set

  # Identity set of containers on the current recursion path, so a
  # self-referential structure is short-circuited instead of looping forever
  # (added on enter, removed on exit — so sibling/diamond references are still
  # walked, only true cycles cut). Mirrors the TS `onPath` WeakSet / Python id().
  on_path = {}.compare_by_identity
  walk(value, denylist, 0, on_path)
end