Class: Phronomy::Guardrail::PromptInjectionGuardrail
- Inherits:
-
InputGuardrail
- Object
- Base
- InputGuardrail
- Phronomy::Guardrail::PromptInjectionGuardrail
- Defined in:
- lib/phronomy/guardrail/prompt_injection_guardrail.rb
Overview
Detects potential prompt injection attempts in the agent input.
Prompt injection is an attack where an adversary embeds LLM instructions inside data sources (e.g. RAG chunks, tool results, user input) to override the agent's intended behaviour.
This guardrail scans the input string for common injection patterns and calls Base#fail! when a match is found. It is intended to be registered as an input guardrail on agents that consume untrusted external content.
Constant Summary collapse
- DEFAULT_PATTERNS =
Common prompt injection / jailbreak patterns.
[ /ignore\s+(previous|prior|all)\s+instructions?/i, /disregard\s+(previous|prior|all)\s+instructions?/i, /forget\s+(previous|prior|all)\s+instructions?/i, /override\s+(previous|prior|all)\s+instructions?/i, /new\s+instructions?:\s/i, /\byour\s+new\s+(role|instructions?|task)\b/i, /you\s+are\s+now\s+(a|an)\b/i, /\bact\s+as\s+(a|an)\b/i, /\bpretend\s+(you\s+are|to\s+be)\b/i, /\bdo\s+not\s+follow\s+(your|the)\s+instructions?\b/i ].freeze
Instance Method Summary collapse
-
#check(input) ⇒ Object
private
Scans the input string for injection patterns.
-
#initialize(extra_patterns: []) ⇒ PromptInjectionGuardrail
constructor
private
A new instance of PromptInjectionGuardrail.
Methods inherited from Base
Constructor Details
#initialize(extra_patterns: []) ⇒ PromptInjectionGuardrail
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Returns a new instance of PromptInjectionGuardrail.
42 43 44 45 |
# File 'lib/phronomy/guardrail/prompt_injection_guardrail.rb', line 42 def initialize(extra_patterns: []) super() @patterns = DEFAULT_PATTERNS + extra_patterns end |
Instance Method Details
#check(input) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Scans the input string for injection patterns.
50 51 52 53 54 55 |
# File 'lib/phronomy/guardrail/prompt_injection_guardrail.rb', line 50 def check(input) text = input.is_a?(Hash) ? input.values.join(" ") : input.to_s @patterns.each do |pattern| fail!("Potential prompt injection detected") if text.match?(pattern) end end |