← Back to ThoughtProof

The 4 Layers of AI Agent Security

Framework Agent Security March 3, 2026 · ThoughtProof Research

Everyone's talking about AI agent security. But most conversations collapse everything into one bucket: "is it safe?"

After auditing 20+ agent frameworks and producing 50+ confirmed findings — including multiple CVSS 9.0+ vulnerabilities in production systems — we've found that agent security breaks cleanly into four distinct layers. Each answers a different question, requires different tools, and protects against different threats.

The Stack

Layer Question What It Does
L0 Discovery What agents do you have? Maps the attack surface — agents, tools, connections, data sources
L1 Orchestration Do they work efficiently? Routes tasks, selects models, manages costs and latency
L2 Verification Do they work correctly? Validates reasoning, detects manipulation, catches corrupted outputs
L3 Trust Can others trust that they do? Cryptographic attestation, cross-org verification, agent reputation

Layer 0: Discovery — What Exists?

Before you can secure agents, you need to know they exist. Layer 0 maps the attack surface: which agents are deployed, what tools they access, what data they can reach, and how they connect to each other.

This is where most enterprise security teams start — and where most stop. It answers the inventory question, not the security question.

The tools here are scanners and surface mappers. They tell you "you have 47 agents with access to your CRM, email, and financial systems." Important information. But it says nothing about whether those agents are doing the right thing.

Layer 1: Orchestration — Efficiency

Layer 1 is the plumbing. It's the frameworks that route tasks to agents, select the right model for each job, manage context windows, and optimize for cost and latency.

This is LangChain, CrewAI, AutoGen, AWS Strands — the layer most developers interact with daily. It's also where we've found the most vulnerabilities: tool output injection, unguarded code execution, memory poisoning.

The irony: the layer responsible for making agents work efficiently is also the layer most likely to introduce security holes. Orchestration frameworks prioritize developer experience over security boundaries. That's not a criticism — it's a design tradeoff. But someone needs to verify the output.

Layer 2: Verification — Correctness

This is where the real risk lives.

Layer 2 asks: did the agent's reasoning lead to the right conclusion? Was the chain of thought manipulated? Did a tool injection corrupt the output? Is the confidence score earned or hallucinated?

Layer 0 tells you the agent exists. Layer 1 tells you it ran. Layer 2 tells you whether you should believe the result.

From our audits, the pattern is consistent: agents that look correct on the surface can be silently manipulated at the reasoning level. A tool returns poisoned data. The agent incorporates it. The output looks plausible. No alarm fires.

Verification catches this. Multiple perspectives evaluate the same reasoning chain. Adversarial critics search for weaknesses. Confidence scores are calibrated against actual accuracy, not model self-assessment.

This is what ThoughtProof builds: the verification layer for AI agent reasoning.

Layer 3: Trust — Proof Across Boundaries

Layer 3 extends verification beyond a single organization. When Agent A (yours) interacts with Agent B (theirs), how do you know Agent B's outputs were verified?

This is the "Know Your Agent" problem. It requires cryptographic attestation — proof that verification happened, what standards were applied, and what confidence level was achieved. Not "trust me," but "here's the receipt."

Layer 3 is early. But as AI agents start transacting autonomously — making purchases, signing contracts, executing trades — the question shifts from "is my agent correct?" to "can I prove to a counterparty that my agent was correct?"

Where the Industry Is Today

Most organizations are somewhere between Layer 0 and Layer 1. They're deploying agents, discovering them after the fact, and trusting the orchestration framework to handle security.

Almost nobody is doing Layer 2 systematically. And Layer 3 doesn't exist yet outside of research.

That's the gap. And it's the gap that matters most — because an agent that works efficiently but incorrectly is worse than one that doesn't work at all.

ThoughtProof operates at Layer 2 and Layer 3.

We verify agent reasoning in real-time and provide cryptographic attestation of verification results. Our SDK is open source (MIT). Our protocol is designed to become infrastructure.

pot-sdk on GitHub →