Class: Pikuri::Mcp::Verifier

Inherits:
Object
  • Object
show all
Defined in:
lib/pikuri/mcp/verifier.rb

Overview

Pre-flight check for an MCP server’s textual surface — looking for prompt-injection / exfiltration / tool-chain-hijacking patterns that a malicious or compromised server could embed in its initialize handshake, tool descriptions, or parameter schemas. Wired into Servers#start_one after client.tools but before Servers#resolve_description, so a flagged server never gets its description synthesized and never enters Servers#live_ids.

Two passes

  1. **Mechanical Unicode pre-pass.** Walks every text field the server emits and raises InjectionDetected if it finds a code point in SUSPICIOUS_UNICODE — zero-width characters, RTL/bidi overrides, Unicode tag characters in the UE0000+-+U+E007F+ range, BOMs, interlinear annotation markers. Real MCP server authors have no legitimate reason to include any of these; if one shows up, it’s almost certainly an attempt to hide text from a human reader (and sometimes from the model itself, depending on tokenizer). This pass costs nothing — it runs before any LLM round-trip.

  2. **LLM verification.** When the Unicode pass is clean, the server’s structured surface is handed to @thinker with a prompt describing the threat model and the patterns to flag (instruction-override, pretend-system messages, exfiltration instructions, tool-chain hijacking, authority impersonation, misleading capability claims). A clean response is the literal string “OK” — anything else is treated as a flag and raises InjectionDetected carrying the model’s reasoning.

Cache semantics

Only the LLM-pass result is cached; the Unicode pre-pass runs on every boot (it’s fast). The cache stores “OK” keyed on the full server surface, so an unchanged surface skips the second round-trip entirely on subsequent boots. Negative results (InjectionDetected) are NOT cached — a server that gets rejected today re-runs verification on the next boot, in case the operator legitimately updated it (any update changes the cache key anyway).

False-positive philosophy

The verifier checks for injection, not for capability. A tool whose description honestly says “executes arbitrary JavaScript” or “runs bash commands” is not flagged — the user has chosen to wire that server into their agent, and the capability is the tool’s stated purpose. The verifier’s job is to spot text that tries to manipulate the calling agent into doing things other than what the tool claims to do.

See the prompt in #build_prompt for the precise patterns.

Defined Under Namespace

Classes: InjectionDetected

Constant Summary collapse

PROMPT_VERSION =

Bump when #build_prompt changes meaningfully. Cache folds this into its key fingerprint so a prompt edit invalidates every cached “OK” without anyone having to rm the cache directory. Version 2: the thinker gained a generic system prompt (Thinker::SYSTEM_PROMPT), changing the effective prompt for every entry.

2
CACHE_DIR =

Where the production cache lives — a sibling of Cache::DIR so verification verdicts never collide with synthesized descriptions (same key fingerprint, different meaning).

File.join(File.dirname(Cache::DIR), 'mcp_verifications')
SUSPICIOUS_UNICODE =

Code points with no legitimate place in human-readable tool documentation. Each range catches one category of “hide something from a reader”:

  • U200B+–+U+200F+ — zero-width space / non-joiner / joiner, LRM/RLM directionality marks.

  • U202A+–+U+202E+ — bidi overrides (LRE, RLE, PDF, LRO, RLO).

  • U2060+–+U+2064+ — word joiner, invisible math operators.

  • U2066+–+U+2069+ — isolate markers (LRI, RLI, FSI, PDI).

  • UFEFF+ — ZWNBSP / byte-order mark.

  • UFFF9+–+U+FFFB+ — interlinear annotation markers.

  • UE0000+–+U+E007F+ — Unicode tag characters (the “ASCII as PUA” channel made famous by recent prompt-injection PoCs).

/[\u{200B}-\u{200F}\u{202A}-\u{202E}\u{2060}-\u{2064}\u{2066}-\u{2069}\u{FEFF}\u{FFF9}-\u{FFFB}\u{E0000}-\u{E007F}]/.freeze

Instance Method Summary collapse

Constructor Details

#initialize(transport: nil, cancellable: nil, thinker: nil, cache: nil) ⇒ Verifier

The easy path is Verifier.new(transport: …) — the Thinker and the production on-disk Cache (under CACHE_DIR) are built right here. thinker: exists as the explicit override (a custom callable, a test fake) and is mutually exclusive with the transport path. Same constructor shape as Synthesizer.

Parameters:

  • transport (Pikuri::Agent::ChatTransport, nil) (defaults to: nil)

    when set, a Thinker is constructed from it (and cancellable:); the verification passes run against this model.

  • cancellable (Pikuri::Agent::Control::Cancellable, nil) (defaults to: nil)

    forwarded to the constructed Thinker so a boot-time Ctrl+C aborts the pass. Only meaningful on the transport: path.

  • thinker (#call, nil) (defaults to: nil)

    one-arg callable invoked as thinker.call(prompt), replacing the built-in Thinker.

  • cache (Cache, Cache::NULL, nil) (defaults to: nil)

    cache for the LLM-pass result. When nil (the default): the transport: path builds the production on-disk Cache under CACHE_DIR; the thinker: path falls back to Cache::NULL (no persistence — the test default).

Raises:

  • (ArgumentError)

    when neither or both of transport: and thinker: are given, or when cancellable: is combined with thinker:



123
124
125
126
127
128
129
130
131
132
133
134
# File 'lib/pikuri/mcp/verifier.rb', line 123

def initialize(transport: nil, cancellable: nil, thinker: nil, cache: nil)
  raise ArgumentError, 'pass exactly one of transport: or thinker:' if transport.nil? == thinker.nil?
  raise ArgumentError, 'cancellable: only applies to the transport: path' if thinker && cancellable

  @thinker = thinker || Thinker.new(transport: transport, cancellable: cancellable)
  @cache = cache ||
           if transport
             Cache.new(model_id: transport.model, prompt_version: PROMPT_VERSION, dir: CACHE_DIR)
           else
             Cache::NULL
           end
end

Instance Method Details

#call(entry:, client:, tools:) ⇒ void

This method returns an undefined value.

Verify the server’s surface. Returns nothing on success; raises InjectionDetected on failure.

Parameters:

Raises:



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/pikuri/mcp/verifier.rb', line 145

def call(entry:, client:, tools:)
  check_unicode!(entry, client, tools)

  # The cache stores ONLY the literal string "OK". A rejection
  # raises {InjectionDetected} from inside the cache block, and
  # {UrlCache#fetch} skips persistence when the block raises —
  # so a rejected server gets re-verified on the next boot.
  @cache.fetch(entry: entry, client: client, tools: tools) do
    # Cache miss → real LLM round-trip. The same heads-up
    # rationale as {Synthesizer#call} applies — verifying
    # silently for tens of seconds confuses the user.
    LOGGER.info("Verifying MCP server #{entry.id.inspect} for prompt-injection patterns, please wait...")
    response = @thinker.call(build_prompt(entry, client, tools))
    next 'OK' if response.to_s.strip.upcase == 'OK'

    raise InjectionDetected,
          "MCP server #{entry.id.inspect} rejected by verifier: #{response.to_s.strip}"
  end
  nil
end