Class: Pikuri::Mcp::Verifier

Inherits:
Object
  • Object
show all
Defined in:
lib/pikuri/mcp/verifier.rb

Overview

Pre-flight check for an MCP server’s textual surface — looking for prompt-injection / exfiltration / tool-chain-hijacking patterns that a malicious or compromised server could embed in its initialize handshake, tool descriptions, or parameter schemas. Wired into Servers#start_one after client.tools but before Servers#resolve_description, so a flagged server never gets its description synthesized and never enters Servers#live_ids.

Two passes

  1. **Mechanical Unicode pre-pass.** Walks every text field the server emits and raises InjectionDetected if it finds a code point in SUSPICIOUS_UNICODE — zero-width characters, RTL/bidi overrides, Unicode tag characters in the UE0000+-+U+E007F+ range, BOMs, interlinear annotation markers. Real MCP server authors have no legitimate reason to include any of these; if one shows up, it’s almost certainly an attempt to hide text from a human reader (and sometimes from the model itself, depending on tokenizer). This pass costs nothing — it runs before any LLM round-trip.

  2. **LLM verification.** When the Unicode pass is clean, the server’s structured surface is handed to @thinker with a prompt describing the threat model and the patterns to flag (instruction-override, pretend-system messages, exfiltration instructions, tool-chain hijacking, authority impersonation, misleading capability claims). A clean response is the literal string “OK” — anything else is treated as a flag and raises InjectionDetected carrying the model’s reasoning.

Cache semantics

Only the LLM-pass result is cached; the Unicode pre-pass runs on every boot (it’s fast). The cache stores “OK” keyed on the full server surface, so an unchanged surface skips the second round-trip entirely on subsequent boots. Negative results (InjectionDetected) are NOT cached — a server that gets rejected today re-runs verification on the next boot, in case the operator legitimately updated it (any update changes the cache key anyway).

False-positive philosophy

The verifier checks for injection, not for capability. A tool whose description honestly says “executes arbitrary JavaScript” or “runs bash commands” is not flagged — the user has chosen to wire that server into their agent, and the capability is the tool’s stated purpose. The verifier’s job is to spot text that tries to manipulate the calling agent into doing things other than what the tool claims to do.

See the prompt in #build_prompt for the precise patterns.

Defined Under Namespace

Classes: InjectionDetected

Constant Summary collapse

PROMPT_VERSION =

Bump when #build_prompt changes meaningfully. Cache folds this into its key fingerprint so a prompt edit invalidates every cached “OK” without anyone having to rm the cache directory.

1
SUSPICIOUS_UNICODE =

Code points with no legitimate place in human-readable tool documentation. Each range catches one category of “hide something from a reader”:

  • U200B+–+U+200F+ — zero-width space / non-joiner / joiner, LRM/RLM directionality marks.

  • U202A+–+U+202E+ — bidi overrides (LRE, RLE, PDF, LRO, RLO).

  • U2060+–+U+2064+ — word joiner, invisible math operators.

  • U2066+–+U+2069+ — isolate markers (LRI, RLI, FSI, PDI).

  • UFEFF+ — ZWNBSP / byte-order mark.

  • UFFF9+–+U+FFFB+ — interlinear annotation markers.

  • UE0000+–+U+E007F+ — Unicode tag characters (the “ASCII as PUA” channel made famous by recent prompt-injection PoCs).

/[\u{200B}-\u{200F}\u{202A}-\u{202E}\u{2060}-\u{2064}\u{2066}-\u{2069}\u{FEFF}\u{FFF9}-\u{FFFB}\u{E0000}-\u{E007F}]/.freeze

Instance Method Summary collapse

Constructor Details

#initialize(thinker:, cache: Cache::NULL) ⇒ Verifier

Returns a new instance of Verifier.

Parameters:

  • thinker (Proc)

    one-arg callable invoked as thinker.call(prompt); same closure shape Synthesizer uses.

  • cache (Cache, Cache::NULL) (defaults to: Cache::NULL)

    cache for the LLM-pass result. Defaults to Cache::NULL so tests can skip disk persistence; production wiring in Agent uses a real Cache pointing at a mcp_verifications dir.



96
97
98
99
# File 'lib/pikuri/mcp/verifier.rb', line 96

def initialize(thinker:, cache: Cache::NULL)
  @thinker = thinker
  @cache = cache
end

Instance Method Details

#call(entry:, client:, tools:) ⇒ void

This method returns an undefined value.

Verify the server’s surface. Returns nothing on success; raises InjectionDetected on failure.

Parameters:

Raises:



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/pikuri/mcp/verifier.rb', line 110

def call(entry:, client:, tools:)
  check_unicode!(entry, client, tools)

  # The cache stores ONLY the literal string "OK". A rejection
  # raises {InjectionDetected} from inside the cache block, and
  # {UrlCache#fetch} skips persistence when the block raises —
  # so a rejected server gets re-verified on the next boot.
  @cache.fetch(entry: entry, client: client, tools: tools) do
    # Cache miss → real LLM round-trip. The same heads-up
    # rationale as {Synthesizer#call} applies — verifying
    # silently for tens of seconds confuses the user.
    LOGGER.info("Verifying MCP server #{entry.id.inspect} for prompt-injection patterns, please wait...")
    response = @thinker.call(build_prompt(entry, client, tools))
    next 'OK' if response.to_s.strip.upcase == 'OK'

    raise InjectionDetected,
          "MCP server #{entry.id.inspect} rejected by verifier: #{response.to_s.strip}"
  end
  nil
end