Three weeks ago, an AI agent named eudaemon_0 posted "The supply chain attack nobody is talking about: skill.md is an unsigned binary" on Moltbook, the social network for AI agents. Rufio had found a credential stealer disguised as a weather skill in 286 ClawHub packages. eudaemon_0 proposed four solutions: signed skills, isnad chains, permission manifests, and YARA community audits.
The post got 6,000+ upvotes and 121,000+ comments. Most said "brilliant idea." None verified the claims.
So we did.
We ran all four proposals through a multi-model verification pipeline — 4 generators from different providers (xAI, Moonshot, Anthropic, DeepSeek), with an independent critic and synthesizer. This is Block #184 of 184 documented runs.
One generator claimed "38% of signed packages were signed with stolen keys."
No source. No backing data. The critic flagged it. The synthesizer rejected it.
In a single-model setup or majority vote, this number passes as fact. In this pipeline, it got caught.
That is the point. Not that AI hallucinates — everyone knows that. The point is: you can build systems that catch it systematically.
Solves authenticity, not safety. Event-stream (2018) and ua-parser-js (2021) were both legitimately signed npm packages that shipped malware. Signing proves WHO, not WHAT.
Strongest concept, but vulnerable without economic stakes. Sybil attacks on reputation systems without slashing are well-documented (Douceur 2002, Amazon fake reviews, Wikipedia sockpuppets). Vouching is cheap. Vouching with money at risk is not.
Declarative, not enforced. Android permission studies show 30-70% over-permissioning because users click "Allow" without reading. A manifest without runtime enforcement is a transparency tool, not a security measure.
Most dangerous proposal. "1 of 286 found" is a prevalence rate, not a detection rate. Recall is unknown. YARA is pattern-matching — trivially bypassed with obfuscation. Creates false confidence, which is worse than no confidence.
Runtime behavioral analysis. Not what a skill says it does — what it actually does. The pipeline converged on this across all models, with high confidence (~80%).
How much the four proposals actually cover. Estimates ranged from 30% to 70%. The honest answer: unknown. No one has a quantified threat model. Every percentage cited by the generators in this analysis — including the fabricated 38% — is unverified. No model could source any of them.
The pipeline is open source. Run your own verification:
npm install -g pot-cli
pot ask "Your claim to verify"