pot-sdk v0.1: Verify Any AI Output With One Function Call

SDK Release TypeScript PoT-187 · February 23, 2026

The pot-cli has been public for a while. Today, the SDK ships.

The problem it solves: Your agent calls another agent's tool. Output comes back. You have zero epistemic basis to judge if it's correct. You can trust blindly (dangerous), regenerate it yourself (expensive, same bias), or ask a neutral verification layer.

pot-sdk is the third option. One function call, four independent AI models, structured adversarial critique.

import { verify } from 'pot-sdk';

const result = await verify(agentOutput, {
  question: 'Is this output factually correct and free of hallucinations?',
  apiKeys: {
    anthropic: process.env.ANTHROPIC_API_KEY,
    xai:       process.env.XAI_API_KEY,
    deepseek:  process.env.DEEPSEEK_API_KEY,
  },
  tier: 'basic'  // 'basic' (<1s) | 'pro' (3-5s)
});

if (result.confidence < 0.6 || result.flags.includes('unverified-claims')) {
  // escalate to adversarial multi-model
  const deep = await verify(agentOutput, { ...params, tier: 'pro' });
}

Three tiers, risk-proportional

Not every output needs the same level of scrutiny. pot-sdk exposes three verification tiers designed for different latency and stakes requirements:

Tier	Latency	Architecture	Use case
`basic`	<1s	Single model sanity check	Real-time A2A, routine outputs
`pro`	3–5s	Generator + Critic + Synthesizer	High-stakes actions, async workflows
`deep`	30s+	Rotated synthesizers, full dissent documentation	Strategic decisions, compliance attestation

The pattern for latency-sensitive systems: basic inline on every output, pro triggered only when basic flags something or the action is irreversible.

What makes it different from "just ask another model"

Passing an output to a second model for validation has a structural problem: if both models share training distribution or provider, you're not getting a second opinion — you're getting a correlated opinion with extra steps.

pot-sdk enforces structural adversarial roles:

Generators produce independent answers without seeing each other's output
Critic is explicitly tasked with finding flaws in the generator outputs (its job is to disagree, not confirm)
Synthesizer resolves conflicts and produces a confidence-weighted consensus

Two metrics track whether this worked:

MDI — Model Diversity Index

Measures input-side diversity. Did the generators actually bring different perspectives, or did they converge prematurely? Low MDI flags groupthink before synthesis.

SAS — Synthesis Audit Score

Measures output fidelity. Did the synthesizer faithfully represent minority positions, or did it just follow the majority? Flags synthesizer dominance when one generator's framing drives >60% of the final synthesis.

Together: MDI measures whether the inputs were diverse enough. SAS measures whether that diversity survived synthesis. Both matter.

BYOK — any provider, your keys

pot-sdk ships with native support for Anthropic, xAI, DeepSeek, Moonshot, and OpenAI. No API keys are bundled — you bring your own. Everything runs directly from your environment to the provider. No proxying, no telemetry.

Beyond the built-in providers, any OpenAI-compatible endpoint works via baseUrl — local models (Ollama), Together.ai, custom deployments. If it speaks the OpenAI API format, it works.

// Any combination of providers — your choice
const result = await verify(claim, {
  providers: {
    anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
    openai:    { apiKey: process.env.OPENAI_API_KEY },
    xai:       { apiKey: process.env.XAI_API_KEY },
    // or any OpenAI-compatible endpoint:
    local:     { apiKey: 'ollama', baseUrl: 'http://localhost:11434/v1' },
  }
});

Independence comes from the adversarial architecture — generators work without seeing each other's output, the critic is structurally tasked with finding flaws. Not from excluding specific providers.

We audited the SDK with PoT before releasing it

Before pushing to GitHub, we ran the SDK source through PoT-187 — a full adversarial security audit using the protocol the SDK itself implements.

It found three critical issues:

Fake similarity metric

The diversity calculation used a slice comparison that always returned the same value. MDI scores were meaningless. Fixed: Jaccard similarity on tokenized outputs.

Same-provider dual synthesis

The deep tier was selecting both synthesizers from the same provider in some configurations. Structural adversarial diversity was silently broken. Fixed: distinct provider enforcement.

API key lifecycle

Keys were held in memory beyond their use window. Fixed: zeroing after instantiation.

All three were fixed before the first public commit. The SDK was then audited again under PoT-188 for EU AI Act compliance, adding a human oversight hook (pot.with_oversight()), a tamper-evident audit trail (append-only JSON-LD), and a transparency layer for high-risk output classification.

Verification tools should verify themselves. That felt like the right way to ship.

v0.1 Benchmarks (110 runs)

96.7%

Adversarial Detection

92%

Hallucination Catch Rate

Providers (BYOK, any endpoint)

BYOK

No data leaves your infra

Install

npm install pot-sdk

BYOK — bring your own API keys. Everything runs on your infrastructure. No telemetry, no proxying, direct provider calls.

import { verify, deepAnalysis, createAttestation } from 'pot-sdk';

const apiKeys = {
  anthropic: process.env.ANTHROPIC_API_KEY,
  xai:       process.env.XAI_API_KEY,
  deepseek:  process.env.DEEPSEEK_API_KEY,
  openai:    process.env.OPENAI_API_KEY,  // any provider, BYOK
};

// Basic: single model, <1s
const result = await verify(output, { question, apiKeys, tier: 'basic' });

// Pro: adversarial multi-model, 3-5s
const deep = await verify(output, { question, apiKeys, tier: 'pro' });

// Full deep run with rotated synthesizers
const analysis = await deepAnalysis(question, { apiKeys });

// Generate compliance attestation (EU AI Act)
const attestation = createAttestation(result);

This started because I kept asking: how do I know my agent's output is actually correct? Not "probably fine" — verifiably correct, with documented dissent when it isn't.

Turns out a lot of agents have the same question.

GitHub (MIT) · npm · Protocol Specification