Module: Rubino::Security::SecretDetector
- Defined in:
- lib/rubino/security/secret_detector.rb
Overview
Shared secret/credential detection used by two seams:
1. Output redaction (Redactor) — the PREFIXLESS_PATTERNS below are
folded into `redact_sensitive_text` so prefix-less credential SHAPES
(an AWS secret-access-key near `aws_secret`, etc.) get masked in tool
output. PRECISE patterns only — NO entropy sweep on tool output, which
would over-redact hashes / UUIDs / base64 blobs in normal output
(the #67 over-redaction class).
2. The memory WRITE path (ThreatScanner) — `present?(content)` is the
gate. A memory save is long-lived (spliced into every future system
prompt) and a false positive is cheap (a fact just isn't saved), so
here we ALSO run a conservative high-entropy heuristic on top of the
known shapes. A secret-bearing write is refused.
The known-prefix shapes (sk-, ghp_, AKIA, AIza, xox*, JWT, PEM, …) are reused from Redactor so there is a SINGLE source of truth for them.
Constant Summary collapse
- AWS_SECRET_KEY_RE =
Prefix-less credential SHAPES that the prefixed PREFIX_RE misses. These are precise (anchored / context-gated) so they are safe to run on tool OUTPUT as well as on the memory-write path.
AWS secret access key: a 40-char base64 token has no prefix of its own, so we only treat it as a secret when it appears NEAR an ‘aws_secret_access_key` / `aws_secret` cue (assignment, JSON field, CLI flag). Anchored to a non-token boundary so a longer blob can’t lend a 40-char window.
%r{ (?:aws.{0,4}secret.{0,4}(?:access.{0,4})?key|secret.{0,4}access.{0,4}key) ['"]?\s*[=:]\s* # optional closing quote of the key, then = or : (['"]?) ([A-Za-z0-9/+]{40}) \1 }xi- PREFIXLESS_PATTERNS =
Standalone shapes that are specific enough to flag without a context cue.
[ AWS_SECRET_KEY_RE ].freeze
- MIN_ENTROPY_LEN =
— high-entropy heuristic (memory-write path ONLY) ———————
A conservative generic-secret detector for the write path, where a false positive only costs an un-saved fact. We require ALL of:
* a long contiguous token (>= MIN_ENTROPY_LEN chars), * a "rich" charset — BOTH letters-of-mixed-case AND digits (this alone excludes hex git SHAs, lowercase hex, and UUIDs, which are hex+dashes only), and * Shannon entropy >= MIN_ENTROPY_BITS bits/char.The combination keeps git SHAs (40 hex, ~4.0 bits but no mixed case), UUIDs (dashed hex), and ordinary words well below the bar while real 40-char API secrets (mixed case + digits, ~5.2 bits/char) trip it.
25- MIN_ENTROPY_BITS =
4.0- TOKEN_RE =
Token = a contiguous run of base64-url chars (no separators). UUIDs and dotted/dashed identifiers are split into short pieces and never reach the length bar as a single token.
%r{[A-Za-z0-9+/_=-]{#{MIN_ENTROPY_LEN},}}
Class Method Summary collapse
-
.high_entropy_secret?(text) ⇒ Boolean
Scan each contiguous long token; flag if any clears both the charset and the Shannon-entropy bar.
-
.present?(text, entropy: false) ⇒ Boolean
True when
textcarries a credential. -
.rich_charset?(tok) ⇒ Boolean
Rich charset = has lowercase AND uppercase letters AND a digit.
-
.shannon_entropy(str) ⇒ Object
Shannon entropy in bits per character.
Class Method Details
.high_entropy_secret?(text) ⇒ Boolean
Scan each contiguous long token; flag if any clears both the charset and the Shannon-entropy bar.
86 87 88 89 90 |
# File 'lib/rubino/security/secret_detector.rb', line 86 def high_entropy_secret?(text) text.scan(TOKEN_RE).any? do |tok| rich_charset?(tok) && shannon_entropy(tok) >= MIN_ENTROPY_BITS end end |
.present?(text, entropy: false) ⇒ Boolean
True when text carries a credential. entropy: enables the generic high-entropy heuristic (memory-write path); leave it false for tool output (precise shapes only).
69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/rubino/security/secret_detector.rb', line 69 def present?(text, entropy: false) return false if text.nil? s = text.to_s return false if s.empty? return true if Redactor::PREFIX_RE.match?(s) return true if s.include?("eyJ") && Redactor::JWT_RE.match?(s) return true if s.include?("PRIVATE KEY") && Redactor::PRIVATE_KEY_RE.match?(s) return true if PREFIXLESS_PATTERNS.any? { |re| re.match?(s) } return true if entropy && high_entropy_secret?(s) false end |
.rich_charset?(tok) ⇒ Boolean
Rich charset = has lowercase AND uppercase letters AND a digit. Hex SHAs / UUIDs (single-case hex) and all-lower / all-upper words fail this.
94 95 96 |
# File 'lib/rubino/security/secret_detector.rb', line 94 def rich_charset?(tok) tok.match?(/[a-z]/) && tok.match?(/[A-Z]/) && tok.match?(/[0-9]/) end |
.shannon_entropy(str) ⇒ Object
Shannon entropy in bits per character.
99 100 101 102 103 104 105 106 107 |
# File 'lib/rubino/security/secret_detector.rb', line 99 def shannon_entropy(str) len = str.length.to_f return 0.0 if len.zero? str.each_char.tally.values.sum(0.0) do |count| p = count / len -p * Math.log2(p) end end |