Module: Parse::Agent::PromptHardening
- Extended by:
- PromptHardening
- Included in:
- PromptHardening
- Defined in:
- lib/parse/agent/prompt_hardening.rb
Overview
Sanitization primitives for prompt-injection hardening (NEW-PROMPT-6). A single home for the transforms applied to data that flows toward an LLM: schema descriptions surfaced by the schema tools, untrusted tool result content, and canary scanning of tool results.
All functions are pure (module_function via extend self) and have no
dependency on a live client.
Constant Summary collapse
- FIELD_NAME_RE =
Identifier shape for LLM-surfaced field names: ASCII letter/underscore start, then up to 127 more identifier chars. NOT the secret-field boundary — it permits a leading underscore;
_rperm/_hashed_passwordare stopped by field_allowlist / validate_keys!, untouched here. This only drops non-identifier names (spaces, punctuation, >128 chars, leading digit) that could carry injection payloads in a field name. The length is an injection-safety cap, not a Parse limit — it is set well above any realistic field name so valid identifiers aren't silently dropped from the schema surfaced to the LLM. /\A[a-zA-Z_][a-zA-Z0-9_]{0,127}\z/- DESCRIPTION_CAP =
Max characters retained from any LLM-surfaced description.
200- SCHEMA_DESC_OPEN =
"<schema_description>"- SCHEMA_DESC_CLOSE =
"</schema_description>"- CONTROL_CHARS_RE =
C0 (0x00-0x1F except \t\n) + DEL + C1 (0x7F-0x9F) + zero-width (200B-200D, 2060, FEFF). Stripped from descriptions so invisible control/format characters can't smuggle instructions past a human reviewer or confuse the model.
/[\u0000-\u0008\u000B-\u001F\u007F-\u009F\u200B-\u200D\u2060\uFEFF]/
Instance Method Summary collapse
-
#sanitize_description(str) ⇒ String
Scrub control chars, cap length, and wrap a description in
markers. -
#sanitize_schema_for_llm(schema) ⇒ Hash
Sub-part 1 — sanitize an enriched schema hash before it is serialized toward the LLM.
-
#scan_for_canaries(text) ⇒ String?
Sub-part 3 — scan text for any operator-registered canary phrase.
-
#scrub_marker_injection(content) ⇒ String
Sub-part 2 — neutralize wrapper/marker strings embedded in untrusted content so a stored value cannot impersonate or close the tool-result wrapper.
-
#valid_field_name?(name) ⇒ Boolean
Whether
nameis a safe LLM-surfaceable identifier.
Instance Method Details
#sanitize_description(str) ⇒ String
Scrub control chars, cap length, and wrap a description in
</schema_description> can't close the wrapper).
111 112 113 114 115 116 117 |
# File 'lib/parse/agent/prompt_hardening.rb', line 111 def sanitize_description(str) return str unless str.is_a?(String) cleaned = scrub_marker_injection(str) cleaned = cleaned.gsub(CONTROL_CHARS_RE, "") cleaned = cleaned[0, DESCRIPTION_CAP] if cleaned.length > DESCRIPTION_CAP "#{SCHEMA_DESC_OPEN}#{cleaned}#{SCHEMA_DESC_CLOSE}" end |
#sanitize_schema_for_llm(schema) ⇒ Hash
Sub-part 1 — sanitize an enriched schema hash before it is
serialized toward the LLM. Returns a sanitized deep copy (input is
not mutated). Drops fields whose names fail FIELD_NAME_RE (with a
[Parse::Agent:PROMPT] warning), and scrubs + caps + marker-wraps
every description / usage string (class-level, per-field, and enum
value descriptions).
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/parse/agent/prompt_hardening.rb', line 48 def sanitize_schema_for_llm(schema) return schema unless schema.is_a?(Hash) out = deep_dup(schema) class_name = out["className"] || out[:className] %w[description usage].each do |k| out[k] = sanitize_description(out[k]) if out[k].is_a?(String) end fields = out["fields"] || out[:fields] if fields.is_a?(Hash) fields.keys.each do |fname| unless valid_field_name?(fname) fields.delete(fname) warn "[Parse::Agent:PROMPT] dropped field #{fname.inspect} on " \ "#{class_name.inspect}: invalid identifier" next end cfg = fields[fname] next unless cfg.is_a?(Hash) %w[description usage].each do |k| cfg[k] = sanitize_description(cfg[k]) if cfg[k].is_a?(String) end allowed = cfg["allowed_values"] || cfg[:allowed_values] if allowed.is_a?(Array) allowed.each do |v| next unless v.is_a?(Hash) v["description"] = sanitize_description(v["description"]) if v["description"].is_a?(String) v[:description] = sanitize_description(v[:description]) if v[:description].is_a?(String) end end end end # agent_methods entries are surfaced to the LLM by format_schema exactly # like field descriptions, and their :description / per-parameter # description strings come from the same developer-authored DSL — so they # get the same marker-neutralization / control-char strip / length cap. # (format_methods emits symbol-keyed hashes; tolerate both forms.) methods = out["agent_methods"] || out[:agent_methods] if methods.is_a?(Array) methods.each do |m| next unless m.is_a?(Hash) m["description"] = sanitize_description(m["description"]) if m["description"].is_a?(String) m[:description] = sanitize_description(m[:description]) if m[:description].is_a?(String) sanitize_nested_descriptions!(m["parameters"] || m[:parameters]) end end out end |
#scan_for_canaries(text) ⇒ String?
Sub-part 3 — scan text for any operator-registered canary phrase.
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
# File 'lib/parse/agent/prompt_hardening.rb', line 147 def scan_for_canaries(text) canaries = Parse::Agent.prompt_injection_canaries return nil if canaries.nil? || canaries.empty? s = text.to_s return nil if s.empty? down = s.downcase canaries.each do |c| case c when Regexp return c.source if c.match?(s) else phrase = c.to_s return phrase if !phrase.empty? && down.include?(phrase.downcase) end end nil end |
#scrub_marker_injection(content) ⇒ String
Sub-part 2 — neutralize wrapper/marker strings embedded in untrusted content so a stored value cannot impersonate or close the tool-result wrapper. Idempotent: the escaped form no longer contains the original literal, so re-application is a no-op (content is re-serialized into history every turn).
When Parse::Agent.prompt_marker_strict is true, raises instead of
escaping (fail-closed for high-assurance deployments).
130 131 132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/parse/agent/prompt_hardening.rb', line 130 def scrub_marker_injection(content) s = content.to_s strict = Parse::Agent.prompt_marker_strict injection_markers.each do |marker| next unless s.include?(marker) if strict raise Parse::Agent::SecurityError, "prompt_marker_strict: untrusted content contains a reserved marker" end s = s.gsub(marker, escape_marker(marker)) end s end |
#valid_field_name?(name) ⇒ Boolean
Returns whether name is a safe LLM-surfaceable identifier.
101 102 103 |
# File 'lib/parse/agent/prompt_hardening.rb', line 101 def valid_field_name?(name) FIELD_NAME_RE.match?(name.to_s) end |