Class: Rubino::Memory::ThreatScanner
- Inherits:
-
Object
- Object
- Rubino::Memory::ThreatScanner
- Defined in:
- lib/rubino/memory/threat_scanner.rb
Overview
Scans content destined for the memories table for adversarial patterns.
Memory is a long-lived, cross-session channel that gets *spliced into every future system prompt*, so a single tainted write can persistently bias the agent across runs. We inspect every write at the boundary and refuse anything that smells like a known injection / exfiltration vector. We deliberately err on the side of false-positives — the agent can rephrase, but a planted directive in memory has no antidote.
‘.scan(content)` returns nil when safe, otherwise a short string describing the threat (used as both error_code label and audit log payload).
Constant Summary collapse
- PROMPT_INJECTION_PATTERNS =
Prompt-injection markers. These are the cliches that show up in documented jailbreak attempts; any one match is enough to refuse —legitimate user-profile content has no reason to embed them.
[ /ignore (?:all |the )?previous/i, /disregard (?:all |the )?(?:above|previous)/i, /you are now/i, /new instructions:/i, /^\s*system\s*:/i, /^\s*assistant\s*:/i, /<\|im_start\|>/i, /<\|im_end\|>/i, /\[INST\]/i ].freeze
- URL_CREDENTIAL_PATTERN =
Credentials embedded in a URL — classic data-exfil channel (scheme://user:pass@host).
%r{\b[a-z][a-z0-9+\-.]*://[^/\s:@]+:[^/\s@]+@}i- BASE64_BLOB_PATTERN =
Contiguous base64 of 200+ chars. Reasonable prose never has this; encoded payloads (binaries, encrypted blobs) do.
%r{[A-Za-z0-9+/]{200,}={0,2}}- PIPE_TO_SHELL_PATTERN =
curl/wget piped to a shell — remote code execution recipe.
/\b(?:curl|wget)\b[^\n]*\|\s*(?:sudo\s+)?(?:bash|sh|zsh)\b/i- INVISIBLE_UNICODE_PATTERN =
Zero-width characters and BIDI override / isolate codepoints. Used to hide instructions or swap visible text direction — see the “Trojan Source” class of attacks (CVE-2021-42574).
/[-]/
Class Method Summary collapse
-
.scan(content) ⇒ Object
Returns nil when the content is safe, otherwise a short string naming the detected threat class (e.g. “prompt_injection”).
Class Method Details
.scan(content) ⇒ Object
Returns nil when the content is safe, otherwise a short string naming the detected threat class (e.g. “prompt_injection”).
52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/rubino/memory/threat_scanner.rb', line 52 def scan(content) return nil if content.nil? || content.empty? text = content.to_s return "prompt_injection" if PROMPT_INJECTION_PATTERNS.any? { |p| text.match?(p) } return "exfiltration_url_credentials" if text.match?(URL_CREDENTIAL_PATTERN) return "exfiltration_pipe_to_shell" if text.match?(PIPE_TO_SHELL_PATTERN) return "exfiltration_base64_blob" if text.match?(BASE64_BLOB_PATTERN) return "invisible_unicode" if text.match?(INVISIBLE_UNICODE_PATTERN) nil end |