Class: Phronomy::Guardrail::Builtin::PromptInjectionDetector

Inherits:
InputGuardrail show all
Defined in:
lib/phronomy/guardrail/builtin/prompt_injection_detector.rb

Overview

Input guardrail that detects common prompt injection attempts.

Matches a built-in list of injection patterns (case-insensitive) and raises Phronomy::GuardrailError when any pattern is found in the input string. Additional patterns can be supplied via the +additional_patterns:+ argument.

Limitations: the built-in patterns cover well-known English and Japanese phrasings. Obfuscated, Base64-encoded, or novel injection phrasing may not be detected. For higher-assurance use cases, combine this guardrail with an LLM-based classifier.

Examples:

agent.add_input_guardrail(
  Phronomy::Guardrail::Builtin::PromptInjectionDetector.new
)

# With extra patterns:
detector = Phronomy::Guardrail::Builtin::PromptInjectionDetector.new(
  additional_patterns: [/do anything now/i]
)

Constant Summary collapse

DEFAULT_PATTERNS =

Default patterns that signal a prompt injection attempt.

[
  # --- English patterns ---
  /ignore\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i,
  /disregard\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i,
  /forget\s+(all\s+)?(previous|prior|above)\s+(instructions?|rules?|prompts?)/i,
  /\bsystem\s*prompt\s*:/i,
  /\byou\s+are\s+now\s+(?:a|an)\b/i,
  /\bact\s+as\s+(?:a|an)\b/i,
  /\bpretend\s+(?:you\s+are|to\s+be)\b/i,
  /\bjailbreak\b/i,
  /\bdan\s*mode\b/i,
  /\bdev(?:eloper)?\s*mode\b/i,
  # --- Japanese patterns ---
  /以前の(指示|ルール|プロンプト)を無視/,
  /指示を無視して/,
  /ルールを無視して/,
  /あなたは今(から)?(?!助けて)/,
  /システムプロンプト/,
  /制約(を|から)無視/,
  /制限(を|から)解除/
].freeze

Instance Method Summary collapse

Methods inherited from Phronomy::Guardrail::Base

#run!

Constructor Details

#initialize(additional_patterns: []) ⇒ PromptInjectionDetector

Returns a new instance of PromptInjectionDetector.

Parameters:

  • additional_patterns (Array<Regexp>) (defaults to: [])

    extra patterns to check in addition to the built-in list.



52
53
54
# File 'lib/phronomy/guardrail/builtin/prompt_injection_detector.rb', line 52

def initialize(additional_patterns: [])
  @patterns = DEFAULT_PATTERNS + Array(additional_patterns)
end

Instance Method Details

#check(value) ⇒ Object

Parameters:

  • value (Object)

    the input to check

Raises:



58
59
60
61
62
63
# File 'lib/phronomy/guardrail/builtin/prompt_injection_detector.rb', line 58

def check(value)
  text = value.to_s
  @patterns.each do |pattern|
    fail!("Potential prompt injection detected") if text.match?(pattern)
  end
end