Table of Contents
Fetching ...

Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus

Shuai Wu, Xue Li, Yanna Feng, Yufang Li, Zhijun Wang

Abstract

Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations -- generating plausible but factually incorrect content -- and exhibit systematic biases that are amplified by uneven expert activation during inference. In this paper, we propose the Council Mode, a novel multi-agent consensus framework that addresses these limitations by dispatching queries to multiple heterogeneous frontier LLMs in parallel and synthesizing their outputs through a dedicated consensus model. The Council pipeline operates in three phases: (1) an intelligent triage classifier that routes queries based on complexity, (2) parallel expert generation across architecturally diverse models, and (3) a structured consensus synthesis that explicitly identifies agreement, disagreement, and unique findings before producing the final response. We implement and evaluate this architecture within an open-source AI workspace. Our comprehensive evaluation across multiple benchmarks demonstrates that the Council Mode achieves a 35.9% relative reduction in hallucination rates on the HaluEval benchmark and a 7.8-point improvement on TruthfulQA compared to the best-performing individual model, while maintaining significantly lower bias variance across domains. We provide the mathematical formulation of the consensus mechanism, detail the system architecture, and present extensive empirical results with ablation studies.

Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus

Abstract

Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations -- generating plausible but factually incorrect content -- and exhibit systematic biases that are amplified by uneven expert activation during inference. In this paper, we propose the Council Mode, a novel multi-agent consensus framework that addresses these limitations by dispatching queries to multiple heterogeneous frontier LLMs in parallel and synthesizing their outputs through a dedicated consensus model. The Council pipeline operates in three phases: (1) an intelligent triage classifier that routes queries based on complexity, (2) parallel expert generation across architecturally diverse models, and (3) a structured consensus synthesis that explicitly identifies agreement, disagreement, and unique findings before producing the final response. We implement and evaluate this architecture within an open-source AI workspace. Our comprehensive evaluation across multiple benchmarks demonstrates that the Council Mode achieves a 35.9% relative reduction in hallucination rates on the HaluEval benchmark and a 7.8-point improvement on TruthfulQA compared to the best-performing individual model, while maintaining significantly lower bias variance across domains. We provide the mathematical formulation of the consensus mechanism, detail the system architecture, and present extensive empirical results with ablation studies.

Paper Structure

This paper contains 36 sections, 10 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The Council Pipeline Architecture. Phase 1 uses a lightweight triage classifier (Seed 2.0 Pro) to determine query complexity. Trivial queries are answered directly. Non-trivial queries proceed to Phase 2, where three architecturally diverse expert models (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) generate independent responses in parallel via Promise.all(). Phase 3 synthesizes the expert outputs into a structured four-section response through the consensus model.
  • Figure 2: Context window capacity (in thousands of tokens) for all models evaluated in this study. The diversity in context sizes contributes to complementary knowledge retrieval across the Council pipeline and baseline models.
  • Figure 3: Hallucination rates (%) on the HaluEval benchmark across QA, Summarization, and Dialogue tasks. Lower values indicate better performance. The Council Mode consistently achieves the lowest hallucination rates across all task categories.
  • Figure 4: TruthfulQA benchmark results showing Truthful (%) and Informative (%) scores. The Council Mode (highlighted) achieves the highest scores in both metrics.
  • Figure 5: Scatter plot of Factual Consistency Score vs. Neutrality Score across 500 test prompts. Individual models (circles, triangles, squares) show wide dispersion, while the Council Mode (diamonds) exhibits a tight cluster with high consistency and neutrality, indicating effective bias mitigation.
  • ...and 3 more figures