Class: Phronomy::Guardrail::Builtin::PromptInjectionDetector
- Inherits:
-
InputGuardrail
- Object
- Phronomy::Guardrail::Base
- InputGuardrail
- Phronomy::Guardrail::Builtin::PromptInjectionDetector
- Defined in:
- lib/phronomy/guardrail/builtin/prompt_injection_detector.rb
Overview
Input guardrail that detects common prompt injection attempts.
Matches a built-in list of injection patterns (case-insensitive) and raises Phronomy::GuardrailError when any pattern is found in the input string. Additional patterns can be supplied via the +additional_patterns:+ argument.
Limitations: the built-in patterns cover well-known English and Japanese phrasings. Obfuscated, Base64-encoded, or novel injection phrasing may not be detected. For higher-assurance use cases, combine this guardrail with an LLM-based classifier.
Constant Summary collapse
- DEFAULT_PATTERNS =
Default patterns that signal a prompt injection attempt.
[ # --- English patterns --- /ignore\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i, /disregard\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i, /forget\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i, /\bsystem\s*prompt\s*:/i, /\byou\s+are\s+now\s+(?:a|an)\b/i, /\bact\s+as\s+(?:a|an)\b/i, /\bpretend\s+(?:you\s+are|to\s+be)\b/i, /\bjailbreak\b/i, /\bdan\s*mode\b/i, /\bdev(?:eloper)?\s*mode\b/i, # --- Japanese patterns --- /以前の(指示|ルール|プロンプト)を無視/, /指示を無視して/, /ルールを無視して/, /あなたは今(から)?(?!助けて)/, /システムプロンプト/, /制約(を|から)無視/, /制限(を|から)解除/ ].freeze
Instance Method Summary collapse
- #check(value) ⇒ Object
-
#initialize(additional_patterns: []) ⇒ PromptInjectionDetector
constructor
A new instance of PromptInjectionDetector.
Methods inherited from Phronomy::Guardrail::Base
Constructor Details
#initialize(additional_patterns: []) ⇒ PromptInjectionDetector
Returns a new instance of PromptInjectionDetector.
52 53 54 |
# File 'lib/phronomy/guardrail/builtin/prompt_injection_detector.rb', line 52 def initialize(additional_patterns: []) @patterns = DEFAULT_PATTERNS + Array(additional_patterns) end |
Instance Method Details
#check(value) ⇒ Object
58 59 60 61 62 63 |
# File 'lib/phronomy/guardrail/builtin/prompt_injection_detector.rb', line 58 def check(value) text = value.to_s @patterns.each do |pattern| fail!("Potential prompt injection detected") if text.match?(pattern) end end |