Ask GPT-4 a question. You get one answer. It sounds confident. It might be wrong.
Ask three models the same question. If they all agree, you can be more confident. If they disagree, you just learned something important: the answer is not obvious, and you should look closer.
This is multi-AI deliberation. And we built an open-source framework for it.
The Problem With Single-Model Answers
Every LLM has blind spots. They are trained on different data, with different architectures, different RLHF tuning. Each model has systematic biases that are invisible when you only use one.
- GPT tends toward verbose, structured answers
- Claude tends toward careful, hedged responses
- Gemini tends toward concise, confident claims
- Open-source models have their own quirks
When you use one model, you get its biases baked into every answer. You cannot tell which parts are genuine reasoning and which are artifacts of training.
How Deliberation Fixes This
Quorum puts multiple models in a room and makes them debate.
Here is what happens:
- Propose — Each model independently answers the question
- Critique — Each model reviews the others' answers and pokes holes
- Revise — Models update their answers based on critiques
- Vote — Models vote on the best answer with reasoning
- Synthesize — A final answer is produced from the deliberation
The result is not a committee average. It is an answer that survived scrutiny from multiple perspectives.
Show Me the Code
npm install quorum-ai
import { Council } from "quorum-ai";
const council = new Council({
members: [
{ provider: "openai", model: "gpt-4o" },
{ provider: "anthropic", model: "claude-sonnet-4-5" },
{ provider: "google", model: "gemini-2.0-flash" }
],
protocol: "debate"
});
const result = await council.deliberate(
"Should we use microservices or a monolith for a 3-person startup?"
);
console.log(result.synthesis); // The deliberated answer
console.log(result.votes); // How each model voted
console.log(result.dissent); // Where they disagreed
Three models. One question. A better answer.
When Deliberation Matters Most
- Ambiguous questions — where reasonable people disagree
- High-stakes decisions — architecture, strategy, medical, legal
- Fact verification — catching hallucinations through cross-checking
- Creative work — getting diverse perspectives, not one model's style
For simple factual lookups, one model is fine. For anything where judgment matters, deliberation wins.
What We Found
Across our test suite (226 tests, 70 source files):
- Deliberated answers had fewer hallucinations than any single model
- Disagreement between models was a reliable signal of question difficulty
- The critique phase caught errors that no individual model self-corrected
- Cost overhead is roughly 3x a single call, but the quality improvement on hard questions is worth it
Try It
Quorum is open source (MIT). Use it in your projects, contribute, or just try askquorum.ai to see deliberation in action.
Built by Solvely. We build tools that make AI more reliable. Also see Radius for HubSpot and myagents.to: Managed AI Agents.