← Back to ThoughtProof

How to Audit AI Agent MCP Servers

Security MCP Audit Framework March 4, 2026 · Raul Jäger

Classical security tools find nothing. Here's a 5-phase framework for auditing MCP servers that handle financial operations — and why the attack surface is semantic, not syntactic.

The Problem with Auditing AI Agents

When you audit a smart contract, you know what you're looking for: reentrancy, integer overflow, access control, price manipulation. The vulnerability classes are well-understood. Slither, semgrep, and Mythril catch a large portion of the low-hanging fruit automatically.

When you audit an AI agent MCP server, the tools give you nothing.

I've run semgrep with 200+ rules on multiple MCP server codebases in the past month. The results: zero findings, every time. Not because the code is perfect — but because the vulnerabilities don't look like code bugs. They look like design decisions.

This article explains what to look for.

What is an MCP Server?

Model Context Protocol (MCP) is Anthropic's standard for giving language models structured access to external tools. An MCP server exposes a set of typed functions — tools — that an AI model can call during a conversation.

{
  name: "execute_transfer",
  description: "Send tokens to a recipient address",
  inputSchema: {
    amount: { type: "string" },
    recipient: { type: "string" },  // <-- This is where it gets interesting
    chain: { type: "string" }
  }
}

The AI model reads the tool descriptions and decides when and how to call them based on user instructions. The model is the decision-maker. The MCP server is the executor.

This creates a fundamentally different trust model from traditional software.

The Trust Model Inversion

In classical software security, we assume:

In AI agent security, the model is partially:

The attack surface shifts from what the code does to what the model decides to do.

Phase 1: Map the Tool Surface

Start by enumerating every tool the MCP server exposes and categorize each parameter by who controls it:

Specifically look for: recipient addresses, authority addresses, amount fields, URL parameters embedded in links users click, fee recipients.

Phase 2: Trace the Injection Paths

Prompt injection is the core attack vector. An attacker embeds instructions in any content the agent reads, and those instructions manipulate the agent's subsequent tool calls.

Common injection sources in DeFi/crypto MCP servers:

  1. Token names and symbols — returned from token search APIs, displayed to the model
  2. Transaction metadata — descriptions, memos, on-chain data
  3. Price feed labels — token descriptions from external APIs
  4. NFT metadata — names, descriptions
  5. Smart contract error messages — revert reasons returned to the model
  6. ENS names and social handles — resolved and shown in context

For each injection source, trace the path:

Injection Source → Model Context → Tool Call Parameter → Financial Action

If you can draw a complete path, you have a finding.

Phase 3: Check the Safety Guarantees

For each tool that has financial impact, ask:

1. Can the recipient be changed without the user knowing?
If an injected instruction changes the recipient to an attacker's address, will the user notice before signing?

2. Is the pre-flight confirmation UI adequate?
If the recipient address is buried in a query parameter, users often don't check it. This is especially true on mobile.

3. Does the server validate parameters against a trusted source?
The server knows the user's wallet (from initialization or session). It should validate recipient addresses or at minimum display a prominent warning when they differ from the user's wallet.

4. Are API responses sanitized before being returned to the model?
If an external API returns data that gets passed directly to the model, it can be a secondary injection source.

Phase 4: Check the Session Initialization

MCP servers typically have a session setup phase where the client sends configuration. Look for initialization code that does NOT:

Phase 5: The Instruction Injection via get_instructions

Many MCP servers expose a get_instructions tool that returns system-level guidance injected into the model's context. If this instruction file is loaded from disk, fetched from a remote URL, or configurable by the user — it's a potential attack vector for persistent instruction manipulation.

A supply chain attack on the npm package could inject arbitrary instructions into every AI agent that uses the MCP server.

What Classical Tools Miss

The vulnerability classes map like this:

Classical AI Agent Equivalent Detection
SQL injection Prompt injection Manual trace
Missing auth check Missing parameter validation Manual audit
Unchecked return value Unsanitized API response Manual trace
Privilege escalation Trust model exploitation Architecture review
CSRF Cross-context manipulation Threat modeling

The tools were built for a world where the attack surface is syntactic. In AI agent security, the attack surface is semantic — it's about what the model understands and decides, not what the code explicitly allows.

Checklist: MCP Servers with Financial Operations

The Bigger Picture

MCP servers for financial operations are six months old. The security tooling doesn't exist yet. The vulnerability patterns are just being discovered. Bug bounty programs haven't caught up.

This is an opportunity for security researchers who can think in terms of trust models and agent behavior rather than just code patterns.

The next frontier isn't finding buffer overflows — it's understanding when an AI agent can be made to act against its user's interests. That's the attack surface for the next decade.