Class: Phronomy::Guardrail::Builtin::PromptInjectionDetector

Inherits:
InputGuardrail show all
Defined in:
lib/phronomy/guardrail/builtin/prompt_injection_detector.rb

Overview

Input guardrail that detects common prompt injection attempts.

Matches a built-in list of injection patterns (case-insensitive) and raises Phronomy::GuardrailError when any pattern is found in the input string. Additional patterns can be supplied via the +additional_patterns:+ argument.

Examples:

agent.add_input_guardrail(
  Phronomy::Guardrail::Builtin::PromptInjectionDetector.new
)

# With extra patterns:
detector = Phronomy::Guardrail::Builtin::PromptInjectionDetector.new(
  additional_patterns: [/do anything now/i]
)

Constant Summary collapse

DEFAULT_PATTERNS =

Default patterns that signal a prompt injection attempt.

[
  /ignore\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i,
  /disregard\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i,
  /forget\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i,
  /\bsystem\s*prompt\s*:/i,
  /\byou\s+are\s+now\s+(?:a|an)\b/i,
  /\bact\s+as\s+(?:a|an)\b/i,
  /\bpretend\s+(?:you\s+are|to\s+be)\b/i,
  /\bjailbreak\b/i,
  /\bdan\s*mode\b/i,
  /\bdev(?:eloper)?\s*mode\b/i
].freeze

Instance Method Summary collapse

Methods inherited from Phronomy::Guardrail::Base

#run!

Constructor Details

#initialize(additional_patterns: []) ⇒ PromptInjectionDetector

Returns a new instance of PromptInjectionDetector.

Parameters:

  • additional_patterns (Array<Regexp>) (defaults to: [])

    extra patterns to check in addition to the built-in list.



38
39
40
# File 'lib/phronomy/guardrail/builtin/prompt_injection_detector.rb', line 38

def initialize(additional_patterns: [])
  @patterns = DEFAULT_PATTERNS + Array(additional_patterns)
end

Instance Method Details

#check(value) ⇒ Object

Parameters:

  • value (Object)

    the input to check

Raises:



44
45
46
47
48
49
# File 'lib/phronomy/guardrail/builtin/prompt_injection_detector.rb', line 44

def check(value)
  text = value.to_s
  @patterns.each do |pattern|
    fail!("Potential prompt injection detected") if text.match?(pattern)
  end
end