Module: Mcpeye::Redaction
- Defined in:
- lib/mcpeye/redaction.rb
Overview
v1 redaction: regex-based secret/PII scrubbing applied client-side in the SDK BEFORE anything is sent to the ingest API, and again defensively in the worker before any LLM call. Self-hosting is the real privacy mitigation; this reduces the blast radius of obvious secrets/PII in free-text arguments and intent.
Deliberately conservative: it over-redacts rather than leak. Smarter, structure-aware redaction is a documented future improvement.
Ported from @mcpeye/core (redaction.ts) — keep the patterns, replacements, the depth cap, and the cycle guard in sync across SDKs (TS, Python, Ruby).
Constant Summary collapse
- PATTERNS =
Each entry: [regex, replacement]. Order matches the TS port so that the most-specific keys (sk-ant-) win before the generic ones where relevant.
[ # email [/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/, "[REDACTED_EMAIL]"], # Anthropic key (more specific than the generic sk- key — run first) [/\bsk-ant-[A-Za-z0-9_-]{16,}\b/, "[REDACTED_KEY]"], # OpenAI-style key [/\bsk-[A-Za-z0-9_-]{16,}\b/, "[REDACTED_KEY]"], # GitHub tokens (ghp_, gho_, ghu_, ghs_, ghr_) [/\bgh[pousr]_[A-Za-z0-9]{20,}\b/, "[REDACTED_KEY]"], # AWS access key id [/\bAKIA[0-9A-Z]{16}\b/, "[REDACTED_KEY]"], # Bearer tokens [/\bBearer\s+[A-Za-z0-9._-]{12,}\b/i, "Bearer [REDACTED_KEY]"], # JWT [/\beyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\b/, "[REDACTED_JWT]"], # Credit-card-ish (13-16 digit groups, optional spaces/dashes) [/\b(?:\d[ -]?){13,16}\b/, "[REDACTED_CARD]"], # Phone numbers (loose international) [/\b\+?\d{1,3}[\s.-]?\(?\d{2,4}\)?[\s.-]?\d{3,4}[\s.-]?\d{3,4}\b/, "[REDACTED_PHONE]"] ].freeze
- DEFAULT_DENYLIST =
Exact field names whose values are always dropped, regardless of content.
%w[ password passwd secret token apiKey api_key authorization ].freeze
- REDACTED_FIELD =
"[REDACTED_FIELD]"- REDACTED_TOO_DEEP =
"[REDACTED_TOO_DEEP]"- REDACTED_CYCLE =
"[REDACTED_CYCLE]"- MAX_REDACT_DEPTH =
Max nesting walk descends. Past this we substitute a marker instead of recursing further. Without a cap, a deeply-nested value (which the ingest schema does NOT bound) overflows the call stack — and on the server that throw aborts the whole ingest transaction, 500-ing the entire batch and discarding every other valid event. 64 is far deeper than any real tool payload. Matches @mcpeye/core’s MAX_REDACT_DEPTH.
64
Class Method Summary collapse
-
.redact_string(input) ⇒ Object
Scrub a single string through every pattern.
-
.redact_value(value, opts = {}) ⇒ Object
Recursively redact a JSON-ish value (Hash / Array / String / primitive).
Class Method Details
.redact_string(input) ⇒ Object
Scrub a single string through every pattern.
58 59 60 61 62 63 64 |
# File 'lib/mcpeye/redaction.rb', line 58 def self.redact_string(input) return input unless input.is_a?(String) out = input.dup PATTERNS.each { |(re, replacement)| out = out.gsub(re, replacement) } out end |
.redact_value(value, opts = {}) ⇒ Object
Recursively redact a JSON-ish value (Hash / Array / String / primitive).
opts — extra exact field names to drop (case-insensitive), merged with DEFAULT_DENYLIST.
70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/mcpeye/redaction.rb', line 70 def self.redact_value(value, opts = {}) extra = opts[:denylist_fields] || opts["denylist_fields"] || [] denylist = (DEFAULT_DENYLIST + extra).map { |f| f.to_s.downcase }.to_set # Identity set of containers on the current recursion path, so a # self-referential structure is short-circuited instead of looping forever # (added on enter, removed on exit — so sibling/diamond references are still # walked, only true cycles cut). Mirrors the TS `onPath` WeakSet / Python id(). on_path = {}.compare_by_identity walk(value, denylist, 0, on_path) end |