Class: Pikuri::Mcp::Verifier

Inherits:

Object

Object
Pikuri::Mcp::Verifier

show all

Defined in:: lib/pikuri/mcp/verifier.rb

Overview

Pre-flight check for an MCP server’s textual surface — looking for prompt-injection / exfiltration / tool-chain-hijacking patterns that a malicious or compromised server could embed in its initialize handshake, tool descriptions, or parameter schemas. Wired into Servers#start_one after client.tools but before Servers#resolve_description, so a flagged server never gets its description synthesized and never enters Servers#live_ids.

Two passes

**Mechanical Unicode pre-pass.** Walks every text field the server emits and raises InjectionDetected if it finds a code point in SUSPICIOUS_UNICODE — zero-width characters, RTL/bidi overrides, Unicode tag characters in the UE0000+-+U+E007F+ range, BOMs, interlinear annotation markers. Real MCP server authors have no legitimate reason to include any of these; if one shows up, it’s almost certainly an attempt to hide text from a human reader (and sometimes from the model itself, depending on tokenizer). This pass costs nothing — it runs before any LLM round-trip.
**LLM verification.** When the Unicode pass is clean, the server’s structured surface is handed to @thinker with a prompt describing the threat model and the patterns to flag (instruction-override, pretend-system messages, exfiltration instructions, tool-chain hijacking, authority impersonation, misleading capability claims). A clean response is the literal string “OK” — anything else is treated as a flag and raises InjectionDetected carrying the model’s reasoning.

Cache semantics

Only the LLM-pass result is cached; the Unicode pre-pass runs on every boot (it’s fast). The cache stores “OK” keyed on the full server surface, so an unchanged surface skips the second round-trip entirely on subsequent boots. Negative results (InjectionDetected) are NOT cached — a server that gets rejected today re-runs verification on the next boot, in case the operator legitimately updated it (any update changes the cache key anyway).

False-positive philosophy

The verifier checks for injection, not for capability. A tool whose description honestly says “executes arbitrary JavaScript” or “runs bash commands” is not flagged — the user has chosen to wire that server into their agent, and the capability is the tool’s stated purpose. The verifier’s job is to spot text that tries to manipulate the calling agent into doing things other than what the tool claims to do.

See the prompt in #build_prompt for the precise patterns.

Defined Under Namespace

Classes: InjectionDetected

Constant Summary collapse

PROMPT_VERSION = Bump when #build_prompt changes meaningfully. Cache folds this into its key fingerprint so a prompt edit invalidates every cached “OK” without anyone having to rm the cache directory.

SUSPICIOUS_UNICODE = Code points with no legitimate place in human-readable tool documentation. Each range catches one category of “hide something from a reader”: U200B+–+U+200F+ — zero-width space / non-joiner / joiner, LRM/RLM directionality marks. U202A+–+U+202E+ — bidi overrides (LRE, RLE, PDF, LRO, RLO). U2060+–+U+2064+ — word joiner, invisible math operators. U2066+–+U+2069+ — isolate markers (LRI, RLI, FSI, PDI). UFEFF+ — ZWNBSP / byte-order mark. UFFF9+–+U+FFFB+ — interlinear annotation markers. UE0000+–+U+E007F+ — Unicode tag characters (the “ASCII as PUA” channel made famous by recent prompt-injection PoCs).

/[\u{200B}-\u{200F}\u{202A}-\u{202E}\u{2060}-\u{2064}\u{2066}-\u{2069}\u{FEFF}\u{FFF9}-\u{FFFB}\u{E0000}-\u{E007F}]/.freeze

Instance Method Summary collapse

#call(entry:, client:, tools:) ⇒ void

Verify the server’s surface.
#initialize(thinker:, cache: Cache::NULL) ⇒ Verifier constructor

A new instance of Verifier.

Constructor Details

#initialize(thinker:, cache: Cache::NULL) ⇒ `Verifier`

Returns a new instance of Verifier.

Parameters:

thinker (Proc) —

one-arg callable invoked as thinker.call(prompt); same closure shape Synthesizer uses.
cache (Cache, Cache::NULL) (defaults to: Cache::NULL) —

cache for the LLM-pass result. Defaults to Cache::NULL so tests can skip disk persistence; production wiring in Agent uses a real Cache pointing at a mcp_verifications dir.

# File 'lib/pikuri/mcp/verifier.rb', line 96

def initialize(thinker:, cache: Cache::NULL)
  @thinker = thinker
  @cache = cache
end

Instance Method Details

#call(entry:, client:, tools:) ⇒ `void`

This method returns an undefined value.

Verify the server’s surface. Returns nothing on success; raises InjectionDetected on failure.