Ask GPT-4 a question. You get one answer. It sounds confident. It might be wrong.
Ask three models the same question. If they all agree, you can be more confident. If they disagree, you just learned something important: the answer is not obvious, and you should look closer.
This is multi-AI deliberation. And we built an open-source framework for it.
Every LLM has blind spots. They are trained on different data, with different architectures, different RLHF tuning. Each model has systematic biases that are invisible when you only use one.
When you use one model, you get its biases baked into every answer. You cannot tell which parts are genuine reasoning and which are artifacts of training.
Quorum puts multiple models in a room and makes them debate.
Here is what happens:
The result is not a committee average. It is an answer that survived scrutiny from multiple perspectives.
npm install quorum-ai
import { Council } from "quorum-ai";
const council = new Council({
members: [
{ provider: "openai", model: "gpt-4o" },
{ provider: "anthropic", model: "claude-sonnet-4-5" },
{ provider: "google", model: "gemini-2.0-flash" }
],
protocol: "debate"
});
const result = await council.deliberate(
"Should we use microservices or a monolith for a 3-person startup?"
);
console.log(result.synthesis); // The deliberated answer
console.log(result.votes); // How each model voted
console.log(result.dissent); // Where they disagreed
Three models. One question. A better answer.
For simple factual lookups, one model is fine. For anything where judgment matters, deliberation wins.
Across our test suite (226 tests, 70 source files):
npm install quorum-aiQuorum is open source (MIT). Use it in your projects, contribute, or just try askquorum.ai to see deliberation in action.
Built by Solvely. We build tools that make AI more reliable. Also see Radius for HubSpot and myagents.to: Managed AI Agents.