Introducing DPR — measuring what the synthesizer does with objections
Everyone in multi-agent AI focuses on two things: which model generates, and which model criticizes. Nobody talks about the synthesizer — the model that reads all the outputs and produces the final answer.
That's a problem. Because in a 3-rotation deep analysis we ran this week with identical inputs, rotating only the synthesizer produced a 47-point confidence spread: from 45% to 92%, same question, same generators, same critic.
The synthesizer doesn't just combine outputs — it decides which arguments survive. A critic can raise five objections. The synthesizer can quietly discard four of them and produce a confident-sounding summary. To any downstream consumer, that looks like consensus. It isn't.
We call this false consensus. And we built two metrics to detect it.
Measures how evenly the synthesis covers all generator proposals. SAS < 0.5 + one generator dominating >60% of coverage = bias flag. Dominance isn't the problem. Undocumented dominance is.
Measures what fraction of critic objections actually appeared in the synthesis. A high DPR means the synthesis engaged with the objections. A low DPR means it didn't — regardless of what the confidence score says.
The formula is simple:
DPR = objections preserved in synthesis / total objections raised
false_consensus = DPR < 0.4 AND SAS warned AND ≥2 objections detected
The false_consensus flag only fires when all three conditions are true together — which prevents false positives when the critic simply agrees with the generators.
We ran DPR against 8 adversarial test cases in pot-benchmarks v2.0.0, covering scenarios from complete objection suppression to perfect preservation:
Both flagged cases: synthesizer ignored all critic objections despite a SAS warning. DPR = 0.0 in both. The synthesis read as confident and coherent — that was the tell.
When no objections were raised (critic affirmed the generators), DPR correctly returns 1.0 — no dissent to preserve means no dissent was lost.
DPR also handles markdown bullet-point critiques natively — critics that write - This claim is unsupported are treated the same as prose objections. This matters because most critic outputs in practice use list formatting.
Most multi-agent pipelines assume that running multiple models improves reliability. That's true for the generation and critique layers. But if a single model synthesizes all of that, you've introduced a single point of epistemic failure at exactly the step that produces your output.
Put differently: if you're running multi-agent pipelines without auditing the synthesizer, you might be doing single-agent reasoning with extra API costs.
The 47-point spread isn't a bug. It's the synthesizer's prior, expressed as a confidence score. SAS and DPR don't eliminate that — they make it visible.
Across three deep runs, the critic rotation revealed consistent patterns:
None of these is wrong. All three are useful — but only when the synthesizer documents which perspective it weighted and why. That documentation is what DPR measures.
DPR and SAS ship in pot-sdk v0.1.4:
npm install pot-sdk
import { computeDPR } from 'pot-sdk';
const result = computeDPR(critiqueText, synthesisText, sasWarning);
// result.score → 0.0–1.0
// result.false_consensus → boolean
// result.total_objections
// result.preserved
In pot-cli, DPR runs automatically on every ask and deep command and is stored in block.metadata.dpr. The CLI displays 🟢 / 🟡 / 🔴 based on score thresholds.
npm install -g pot-cli
pot deep "Your question" --runs 3 --lang en