Class: Phronomy::Filter::PromptInjectionFilter
- Defined in:
- lib/phronomy/filter/prompt_injection_filter.rb
Overview
Detects potential prompt injection attempts in the agent input.
Prompt injection is an attack where an adversary embeds LLM instructions inside data sources (e.g. RAG chunks, tool results, user input) to override the agent's intended behaviour.
This filter scans the input string for common injection patterns and calls Base#block! when a match is found. It is intended to be registered as an input filter on agents that consume untrusted external content.
Constant Summary collapse
- DEFAULT_PATTERNS =
Common prompt injection / jailbreak patterns.
[ /ignore\s+(previous|prior|all)\s+instructions?/i, /disregard\s+(previous|prior|all)\s+instructions?/i, /forget\s+(previous|prior|all)\s+instructions?/i, /override\s+(previous|prior|all)\s+instructions?/i, /new\s+instructions?:\s/i, /\byour\s+new\s+(role|instructions?|task)\b/i, /you\s+are\s+now\s+(a|an)\b/i, /\bact\s+as\s+(a|an)\b/i, /\bpretend\s+(you\s+are|to\s+be)\b/i, /\bdo\s+not\s+follow\s+(your|the)\s+instructions?\b/i ].freeze
Instance Method Summary collapse
-
#call(value, **_context) ⇒ String, Hash
Scans the input string for injection patterns.
-
#initialize(extra_patterns: []) ⇒ PromptInjectionFilter
constructor
A new instance of PromptInjectionFilter.
Constructor Details
#initialize(extra_patterns: []) ⇒ PromptInjectionFilter
Returns a new instance of PromptInjectionFilter.
45 46 47 48 |
# File 'lib/phronomy/filter/prompt_injection_filter.rb', line 45 def initialize(extra_patterns: []) super() @patterns = DEFAULT_PATTERNS + extra_patterns end |
Instance Method Details
#call(value, **_context) ⇒ String, Hash
Scans the input string for injection patterns.
56 57 58 59 60 61 62 |
# File 'lib/phronomy/filter/prompt_injection_filter.rb', line 56 def call(value, **_context) text = value.is_a?(Hash) ? value.values.join(" ") : value.to_s @patterns.each do |pattern| block!("Potential prompt injection detected") if text.match?(pattern) end value end |