Class: Pikuri::Mcp::Verifier
- Inherits:
-
Object
- Object
- Pikuri::Mcp::Verifier
- Defined in:
- lib/pikuri/mcp/verifier.rb
Overview
Pre-flight check for an MCP server’s textual surface — looking for prompt-injection / exfiltration / tool-chain-hijacking patterns that a malicious or compromised server could embed in its initialize handshake, tool descriptions, or parameter schemas. Wired into Servers#start_one after client.tools but before Servers#resolve_description, so a flagged server never gets its description synthesized and never enters Servers#live_ids.
Two passes
-
**Mechanical Unicode pre-pass.** Walks every text field the server emits and raises InjectionDetected if it finds a code point in SUSPICIOUS_UNICODE — zero-width characters, RTL/bidi overrides, Unicode tag characters in the UE0000+-+U+E007F+ range, BOMs, interlinear annotation markers. Real MCP server authors have no legitimate reason to include any of these; if one shows up, it’s almost certainly an attempt to hide text from a human reader (and sometimes from the model itself, depending on tokenizer). This pass costs nothing — it runs before any LLM round-trip.
-
**LLM verification.** When the Unicode pass is clean, the server’s structured surface is handed to @thinker with a prompt describing the threat model and the patterns to flag (instruction-override, pretend-system messages, exfiltration instructions, tool-chain hijacking, authority impersonation, misleading capability claims). A clean response is the literal string “OK” — anything else is treated as a flag and raises InjectionDetected carrying the model’s reasoning.
Cache semantics
Only the LLM-pass result is cached; the Unicode pre-pass runs on every boot (it’s fast). The cache stores “OK” keyed on the full server surface, so an unchanged surface skips the second round-trip entirely on subsequent boots. Negative results (InjectionDetected) are NOT cached — a server that gets rejected today re-runs verification on the next boot, in case the operator legitimately updated it (any update changes the cache key anyway).
False-positive philosophy
The verifier checks for injection, not for capability. A tool whose description honestly says “executes arbitrary JavaScript” or “runs bash commands” is not flagged — the user has chosen to wire that server into their agent, and the capability is the tool’s stated purpose. The verifier’s job is to spot text that tries to manipulate the calling agent into doing things other than what the tool claims to do.
See the prompt in #build_prompt for the precise patterns.
Defined Under Namespace
Classes: InjectionDetected
Constant Summary collapse
- PROMPT_VERSION =
Bump when #build_prompt changes meaningfully. Cache folds this into its key fingerprint so a prompt edit invalidates every cached “OK” without anyone having to
rmthe cache directory. 1- SUSPICIOUS_UNICODE =
Code points with no legitimate place in human-readable tool documentation. Each range catches one category of “hide something from a reader”:
-
U200B+–+U+200F+ — zero-width space / non-joiner / joiner, LRM/RLM directionality marks.
-
U202A+–+U+202E+ — bidi overrides (LRE, RLE, PDF, LRO, RLO).
-
U2060+–+U+2064+ — word joiner, invisible math operators.
-
U2066+–+U+2069+ — isolate markers (LRI, RLI, FSI, PDI).
-
UFEFF+ — ZWNBSP / byte-order mark.
-
UFFF9+–+U+FFFB+ — interlinear annotation markers.
-
UE0000+–+U+E007F+ — Unicode tag characters (the “ASCII as PUA” channel made famous by recent prompt-injection PoCs).
-
/[\u{200B}-\u{200F}\u{202A}-\u{202E}\u{2060}-\u{2064}\u{2066}-\u{2069}\u{FEFF}\u{FFF9}-\u{FFFB}\u{E0000}-\u{E007F}]/.freeze
Instance Method Summary collapse
-
#call(entry:, client:, tools:) ⇒ void
Verify the server’s surface.
-
#initialize(thinker:, cache: Cache::NULL) ⇒ Verifier
constructor
A new instance of Verifier.
Constructor Details
Instance Method Details
#call(entry:, client:, tools:) ⇒ void
This method returns an undefined value.
Verify the server’s surface. Returns nothing on success; raises InjectionDetected on failure.
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/pikuri/mcp/verifier.rb', line 110 def call(entry:, client:, tools:) check_unicode!(entry, client, tools) # The cache stores ONLY the literal string "OK". A rejection # raises {InjectionDetected} from inside the cache block, and # {UrlCache#fetch} skips persistence when the block raises — # so a rejected server gets re-verified on the next boot. @cache.fetch(entry: entry, client: client, tools: tools) do # Cache miss → real LLM round-trip. The same heads-up # rationale as {Synthesizer#call} applies — verifying # silently for tens of seconds confuses the user. LOGGER.info("Verifying MCP server #{entry.id.inspect} for prompt-injection patterns, please wait...") response = @thinker.call(build_prompt(entry, client, tools)) next 'OK' if response.to_s.strip.upcase == 'OK' raise InjectionDetected, "MCP server #{entry.id.inspect} rejected by verifier: #{response.to_s.strip}" end nil end |