Protocol Specification

ThoughtProof is an epistemic consensus protocol that orchestrates multiple AI agents across different model providers to deliberate, critique, and synthesize knowledge into verifiable reasoning artifacts.

Core Principle

No single AI model can meaningfully verify its own outputs. Epistemic verification requires structural independence — multiple models from different providers proving their reasoning to each other through mandatory adversarial evaluation.

Patent Status: Core protocol mechanisms are patent-pending (USPTO #63/984,669, DPMA registered). This document describes the protocol conceptually. Implementation details are reserved for the formal specification.

The Verification Problem

Why Self-Verification Fails

Asking an AI model to check its own output is like asking Deloitte to audit its own financial statements. The same architecture that generates an answer cannot independently verify that answer's epistemic validity.

Shared blind spots: A model trained on a given data distribution cannot identify errors systematic to that distribution.
Confidence-accuracy decoupling: Models express high confidence in fabricated claims. Self-reported confidence is unreliable.
No adversarial pressure: Without structured critique from independent systems, reasoning errors compound rather than cancel.

The Majority Vote Fallacy

A common multi-model approach is majority voting: query several models, accept what most agree on. This is dangerously flawed.

Models trained on overlapping data with similar architectures share systematic biases. In our benchmark runs, we repeatedly observed three of four models independently producing the same hallucinated statistic. Majority voting would have accepted this as verified fact. Only a dedicated critic agent — tasked with examining reasoning quality rather than voting on conclusions — caught the error.

Five-Stage Pipeline

The protocol orchestrates multi-agent deliberation through five sequential stages, each with defined inputs, outputs, and quality constraints:

1

Normalize

Convert the input query into a structured specification with explicit success criteria, domain classification, and falsifiability conditions. Prevents ambiguity from propagating through the pipeline.

→

2

Generate

Multiple agents backed by different foundation models independently produce solution proposals. No cross-agent communication during generation. The protocol enforces model diversity to prevent monoculture.

→

3

Critique

The structural core of the protocol. Each proposal receives adversarial evaluation from agents using different model providers than the original generator. Critics identify specific flaws: logical errors, unsupported assumptions, missing edge cases. This stage is mandatory and cannot be bypassed.

→

4

Evaluate

Proposals are scored across multiple dimensions including consistency, depth, originality, and resistance to critique. Synthesis opportunities — ways to combine strengths from multiple proposals — are identified.

→

5

Synthesize

A meta-agent integrates all proposals, critiques, and evaluations into an Epistemic Block. The output is not a single answer but a structured reasoning map showing where models agreed, disagreed, and why.

The Critic Difference: In 23% of benchmark cases where majority vote accepted a hallucination, the dedicated critic identified the error through structural analysis — checking reasoning chains, not counting votes.

Epistemic Blocks

Each deliberation produces an Epistemic Block: a self-contained, versioned, cryptographically signed record of the complete reasoning process.

What a Block Contains

The normalized query with success criteria and falsifiability conditions
All generator proposals with full provenance (which model, which version)
All critiques organized by target proposal and attack vector
Multi-dimensional evaluation scores with justifications
Final synthesis with explicit confidence assessments and uncertainty quantification
Metadata including model diversity metrics, cost, duration, and cryptographic signatures

Key Properties

Immutable: Once signed, blocks cannot be altered without invalidating the signature.
Chainable: Blocks can reference prior blocks, enabling iterative refinement of reasoning over time.
Reproducible: Any agent can re-evaluate the reasoning chain given the same inputs.
Local-first: Stored locally by default. Sharing is opt-in and high-friction by design.

Dissent as Signal: When models disagree significantly, the block preserves minority opinions rather than forcing false consensus. Divergent blocks are often more valuable than convergent ones — they reveal genuine uncertainty.

Provider Neutrality

The protocol's most fundamental design constraint is structural independence from any single AI provider. This is not a feature — it is the architectural foundation.

Why No Provider Can Build This

Interest Alignment: OpenAI's mission is to commercialize GPT. A verification layer that frequently flags GPT outputs as unreliable undermines their business. Even with good intentions, internal incentive structures bias toward favorable self-assessment.
Closed Ecosystems: Google's verification validates only Gemini. Anthropic's only Claude. Fragmented, provider-specific audit trails defeat the purpose of verification.
Regulatory Credibility: The EU AI Act and emerging regulations are skeptical of self-certification. An audit trail from the same company that built the model carries minimal weight in compliance proceedings.

Model Diversity Enforcement

The protocol enforces diversity through a quantitative index measuring the heterogeneity of participating models. If all agents use the same foundation model, they share identical biases and blind spots. Deliberations with insufficient diversity are flagged, and contributions from over-represented models receive diminishing weight.

The Structural Moat

No major AI company will use competing models to red-team their own offerings. This creates a permanent, non-replicable advantage for independent verification protocols. The conflict of interest is structural, not resolvable through good intentions.

Security Applications

Beyond epistemic verification, the protocol's multi-agent architecture provides structural advantages for AI security — particularly prompt injection detection.

Multi-Model Divergence as Detection Signal

Different foundation models have different vulnerability profiles. An injection crafted to exploit one model's instruction-following behavior may fail against another's safety training. When the same input produces divergent responses across models, that divergence is a reliable anomaly signal.

Normal queries produce convergent responses (models agree on substance)
Injection-compromised queries produce divergent responses (some models follow the injection, others respond normally)

The Critic as Anomaly Detector

The dedicated critic is already optimized for detecting the exact patterns that prompt injections produce: logical breaks, topic deviation, and inconsistent command structures. It does not need injection-specific training — its mandate to identify reasoning failures naturally surfaces injection artifacts.

Real-World Validation: Chatwoot Audit

In a security audit of Chatwoot Captain AI (26.9k GitHub stars), the PoT pipeline identified 5 critical vulnerabilities across 50 adversarial prompts that single-model review missed:

Prompt Injection: Format Control Bypass (Critical)
Compliance Liability in GDPR/HIPAA contexts (High)
Architecture Information Leak (Medium)
Inconsistent Guardrails (Medium)
Multi-Turn Context Manipulation (Medium)

The dedicated critic identified structural patterns across findings that no individual generator surfaced — demonstrating that multi-agent critique catches what single-model review misses.

Detection, Not Prevention: The protocol identifies injections post-hoc through divergence analysis. In security practice, detection is often more valuable than prevention — intrusion detection systems complement firewalls precisely because prevention alone is insufficient.

BYOK Architecture

Bring Your Own Keys is the architectural decision that makes everything else possible. Users provide their own API keys for model providers.

Implications

Data Sovereignty: Data flows directly from user to model provider to user. ThoughtProof never sees the content.
Privacy by Architecture: Not a data processor under GDPR. Compliance is structural, not procedural.
No Platform Risk: Direct relationship with providers. No centralized dependency.
Cost Transparency: Users pay providers directly at their rates. No markup.

ThoughtProof = Tool, Not Service. We provide orchestration software. Users run it with their credentials. This is structurally complementary to every model provider — every Epistemic Block requires multiple providers, increasing their utilization.

Getting Started

Installation

npm install -g pot-cli

Basic Usage

# Set your API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Run a deliberation
pot run "Your question here"

# View the resulting Epistemic Block
pot blocks list
pot blocks show <block-id>

Resources

npm Package (MIT License)
Contact: support@thoughtproof.ai

Patent Notice

Core protocol mechanisms are patent-pending. The pot-cli reference implementation is available under MIT license for evaluation and non-commercial use. Detailed protocol specification will be published following patent proceedings.