How to Audit AI Agent MCP Servers

Security MCP Audit Framework March 4, 2026 · Raul Jäger

Classical security tools find nothing. Here's a 5-phase framework for auditing MCP servers that handle financial operations — and why the attack surface is semantic, not syntactic.

The Problem with Auditing AI Agents

When you audit a smart contract, you know what you're looking for: reentrancy, integer overflow, access control, price manipulation. The vulnerability classes are well-understood. Slither, semgrep, and Mythril catch a large portion of the low-hanging fruit automatically.

When you audit an AI agent MCP server, the tools give you nothing.

I've run semgrep with 200+ rules on multiple MCP server codebases in the past month. The results: zero findings, every time. Not because the code is perfect — but because the vulnerabilities don't look like code bugs. They look like design decisions.

This article explains what to look for.

What is an MCP Server?

Model Context Protocol (MCP) is Anthropic's standard for giving language models structured access to external tools. An MCP server exposes a set of typed functions — tools — that an AI model can call during a conversation.

{
  name: "execute_transfer",
  description: "Send tokens to a recipient address",
  inputSchema: {
    amount: { type: "string" },
    recipient: { type: "string" },  // <-- This is where it gets interesting
    chain: { type: "string" }
  }
}

The AI model reads the tool descriptions and decides when and how to call them based on user instructions. The model is the decision-maker. The MCP server is the executor.

This creates a fundamentally different trust model from traditional software.

The Trust Model Inversion

In classical software security, we assume:

The user is potentially malicious
The code enforces invariants
Input validation is the primary defense

In AI agent security, the model is partially:

A trusted intermediary between user and system
A decision-maker that constructs tool calls
A target for manipulation (prompt injection)

The attack surface shifts from what the code does to what the model decides to do.

Phase 1: Map the Tool Surface

Start by enumerating every tool the MCP server exposes and categorize each parameter by who controls it:

Agent-constructed (from context) → AI model → HIGH — can be injected
User-provided (passed through) → Human user → MEDIUM
Server-enforced (hardcoded) → Server → LOW
Runtime-derived (session, wallet) → Protocol → LOW — if properly bound

Specifically look for: recipient addresses, authority addresses, amount fields, URL parameters embedded in links users click, fee recipients.

Phase 2: Trace the Injection Paths

Prompt injection is the core attack vector. An attacker embeds instructions in any content the agent reads, and those instructions manipulate the agent's subsequent tool calls.

Common injection sources in DeFi/crypto MCP servers:

Token names and symbols — returned from token search APIs, displayed to the model
Transaction metadata — descriptions, memos, on-chain data
Price feed labels — token descriptions from external APIs
NFT metadata — names, descriptions
Smart contract error messages — revert reasons returned to the model
ENS names and social handles — resolved and shown in context

For each injection source, trace the path:

Injection Source → Model Context → Tool Call Parameter → Financial Action

If you can draw a complete path, you have a finding.

Phase 3: Check the Safety Guarantees

For each tool that has financial impact, ask:

1. Can the recipient be changed without the user knowing?
If an injected instruction changes the recipient to an attacker's address, will the user notice before signing?

2. Is the pre-flight confirmation UI adequate?
If the recipient address is buried in a query parameter, users often don't check it. This is especially true on mobile.

3. Does the server validate parameters against a trusted source?
The server knows the user's wallet (from initialization or session). It should validate recipient addresses or at minimum display a prominent warning when they differ from the user's wallet.

4. Are API responses sanitized before being returned to the model?
If an external API returns data that gets passed directly to the model, it can be a secondary injection source.

Phase 4: Check the Session Initialization

MCP servers typically have a session setup phase where the client sends configuration. Look for initialization code that does NOT:

Bind user identity
Validate caller permissions
Set explicit limits on transaction sizes or frequencies

Phase 5: The Instruction Injection via get_instructions

Many MCP servers expose a get_instructions tool that returns system-level guidance injected into the model's context. If this instruction file is loaded from disk, fetched from a remote URL, or configurable by the user — it's a potential attack vector for persistent instruction manipulation.

A supply chain attack on the npm package could inject arbitrary instructions into every AI agent that uses the MCP server.

What Classical Tools Miss

The vulnerability classes map like this:

Classical	AI Agent Equivalent	Detection
SQL injection	Prompt injection	Manual trace
Missing auth check	Missing parameter validation	Manual audit
Unchecked return value	Unsanitized API response	Manual trace
Privilege escalation	Trust model exploitation	Architecture review
CSRF	Cross-context manipulation	Threat modeling

The tools were built for a world where the attack surface is syntactic. In AI agent security, the attack surface is semantic — it's about what the model understands and decides, not what the code explicitly allows.

Checklist: MCP Servers with Financial Operations

All recipient/authority addresses validated against session-bound user wallet
No agent-constructed parameters flow into financial transactions without validation
API responses sanitized before being returned to model context
Pre-flight UI shows recipient prominently, not buried in URL params
get_instructions content is integrity-protected (hash-checked or bundled)
Session initialization binds user identity and cannot be changed mid-session
No tool description explicitly encourages the model to use attacker-controlled values
Rate limiting on high-impact tools (not just gas costs)

The Bigger Picture

MCP servers for financial operations are six months old. The security tooling doesn't exist yet. The vulnerability patterns are just being discovered. Bug bounty programs haven't caught up.

This is an opportunity for security researchers who can think in terms of trust models and agent behavior rather than just code patterns.

The next frontier isn't finding buffer overflows — it's understanding when an AI agent can be made to act against its user's interests. That's the attack surface for the next decade.