Class: Pikuri::Mcp::Verifier
- Inherits:
-
Object
- Object
- Pikuri::Mcp::Verifier
- Defined in:
- lib/pikuri/mcp/verifier.rb
Overview
Pre-flight check for an MCP server’s textual surface — looking for prompt-injection / exfiltration / tool-chain-hijacking patterns that a malicious or compromised server could embed in its initialize handshake, tool descriptions, or parameter schemas. Wired into Servers#start_one after client.tools but before Servers#resolve_description, so a flagged server never gets its description synthesized and never enters Servers#live_ids.
Two passes
-
**Mechanical Unicode pre-pass.** Walks every text field the server emits and raises InjectionDetected if it finds a code point in SUSPICIOUS_UNICODE — zero-width characters, RTL/bidi overrides, Unicode tag characters in the UE0000+-+U+E007F+ range, BOMs, interlinear annotation markers. Real MCP server authors have no legitimate reason to include any of these; if one shows up, it’s almost certainly an attempt to hide text from a human reader (and sometimes from the model itself, depending on tokenizer). This pass costs nothing — it runs before any LLM round-trip.
-
**LLM verification.** When the Unicode pass is clean, the server’s structured surface is handed to @thinker with a prompt describing the threat model and the patterns to flag (instruction-override, pretend-system messages, exfiltration instructions, tool-chain hijacking, authority impersonation, misleading capability claims). A clean response is the literal string “OK” — anything else is treated as a flag and raises InjectionDetected carrying the model’s reasoning.
Cache semantics
Only the LLM-pass result is cached; the Unicode pre-pass runs on every boot (it’s fast). The cache stores “OK” keyed on the full server surface, so an unchanged surface skips the second round-trip entirely on subsequent boots. Negative results (InjectionDetected) are NOT cached — a server that gets rejected today re-runs verification on the next boot, in case the operator legitimately updated it (any update changes the cache key anyway).
False-positive philosophy
The verifier checks for injection, not for capability. A tool whose description honestly says “executes arbitrary JavaScript” or “runs bash commands” is not flagged — the user has chosen to wire that server into their agent, and the capability is the tool’s stated purpose. The verifier’s job is to spot text that tries to manipulate the calling agent into doing things other than what the tool claims to do.
See the prompt in #build_prompt for the precise patterns.
Defined Under Namespace
Classes: InjectionDetected
Constant Summary collapse
- PROMPT_VERSION =
Bump when #build_prompt changes meaningfully. Cache folds this into its key fingerprint so a prompt edit invalidates every cached “OK” without anyone having to
rmthe cache directory. Version 2: the thinker gained a generic system prompt (Thinker::SYSTEM_PROMPT), changing the effective prompt for every entry. 2- CACHE_DIR =
Where the production cache lives — a sibling of Cache::DIR so verification verdicts never collide with synthesized descriptions (same key fingerprint, different meaning).
File.join(File.dirname(Cache::DIR), 'mcp_verifications')
- SUSPICIOUS_UNICODE =
Code points with no legitimate place in human-readable tool documentation. Each range catches one category of “hide something from a reader”:
-
U200B+–+U+200F+ — zero-width space / non-joiner / joiner, LRM/RLM directionality marks.
-
U202A+–+U+202E+ — bidi overrides (LRE, RLE, PDF, LRO, RLO).
-
U2060+–+U+2064+ — word joiner, invisible math operators.
-
U2066+–+U+2069+ — isolate markers (LRI, RLI, FSI, PDI).
-
UFEFF+ — ZWNBSP / byte-order mark.
-
UFFF9+–+U+FFFB+ — interlinear annotation markers.
-
UE0000+–+U+E007F+ — Unicode tag characters (the “ASCII as PUA” channel made famous by recent prompt-injection PoCs).
-
/[\u{200B}-\u{200F}\u{202A}-\u{202E}\u{2060}-\u{2064}\u{2066}-\u{2069}\u{FEFF}\u{FFF9}-\u{FFFB}\u{E0000}-\u{E007F}]/.freeze
Instance Method Summary collapse
-
#call(entry:, client:, tools:) ⇒ void
Verify the server’s surface.
- #initialize(transport: nil, cancellable: nil, thinker: nil, cache: nil) ⇒ Verifier constructor
Constructor Details
#initialize(transport: nil, cancellable: nil, thinker: nil, cache: nil) ⇒ Verifier
The easy path is Verifier.new(transport: …) — the Thinker and the production on-disk Cache (under CACHE_DIR) are built right here. thinker: exists as the explicit override (a custom callable, a test fake) and is mutually exclusive with the transport path. Same constructor shape as Synthesizer.
123 124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/pikuri/mcp/verifier.rb', line 123 def initialize(transport: nil, cancellable: nil, thinker: nil, cache: nil) raise ArgumentError, 'pass exactly one of transport: or thinker:' if transport.nil? == thinker.nil? raise ArgumentError, 'cancellable: only applies to the transport: path' if thinker && cancellable @thinker = thinker || Thinker.new(transport: transport, cancellable: cancellable) @cache = cache || if transport Cache.new(model_id: transport.model, prompt_version: PROMPT_VERSION, dir: CACHE_DIR) else Cache::NULL end end |
Instance Method Details
#call(entry:, client:, tools:) ⇒ void
This method returns an undefined value.
Verify the server’s surface. Returns nothing on success; raises InjectionDetected on failure.
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/pikuri/mcp/verifier.rb', line 145 def call(entry:, client:, tools:) check_unicode!(entry, client, tools) # The cache stores ONLY the literal string "OK". A rejection # raises {InjectionDetected} from inside the cache block, and # {UrlCache#fetch} skips persistence when the block raises — # so a rejected server gets re-verified on the next boot. @cache.fetch(entry: entry, client: client, tools: tools) do # Cache miss → real LLM round-trip. The same heads-up # rationale as {Synthesizer#call} applies — verifying # silently for tens of seconds confuses the user. LOGGER.info("Verifying MCP server #{entry.id.inspect} for prompt-injection patterns, please wait...") response = @thinker.call(build_prompt(entry, client, tools)) next 'OK' if response.to_s.strip.upcase == 'OK' raise InjectionDetected, "MCP server #{entry.id.inspect} rejected by verifier: #{response.to_s.strip}" end nil end |