Table of Contents
Fetching ...

The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization

Luoxi Tang, Yuqiao Meng, Joseph Costa, Yingxue Zhang, Muchao Ye, Zhaohan Xi

TL;DR

This work tackles debate collapse in multi-agent debate systems by introducing a three-level uncertainty framework—within-agent, between-agent, and system-wide—that correlates with incorrect reasoning and instability. Building on this diagnostic insight, it proposes Uncertainty-Driven Policy Optimization (UDPO), an asymmetric, uncertainty-informed training objective with stability, agreement, and confidence components, plus a clipped update rule and uncertainty-based replay. Empirical results across GSM8K, TruthfulQA, and CommonsenseQA show UDPO markedly improves accuracy while substantially reducing uncertainty, and maintains robustness under adversarial attacks, outperforming standard MAD, MAPPO, and RMAAC baselines. The findings establish uncertainty signals as reliable predictors of debate health and demonstrate a practical mitigation pathway to more robust and calibrated MAD, with potential implications for reliable collaborative reasoning in complex tasks.

Abstract

Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous reasoning. Existing methods lack principled mechanisms to detect or prevent such failures. To address this gap, we first propose a hierarchical metric that quantifies behavioral uncertainty at three levels: intra-agent (individual reasoning uncertainty), inter-agent (interactive uncertainty), and system-level (output uncertainty). Empirical analysis across several benchmarks reveals that our proposed uncertainty quantification reliably indicates system failures, which demonstrates the validity of using them as diagnostic metrics to indicate the system failure. Subsequently, we propose a mitigation strategy by formulating an uncertainty-driven policy optimization to penalize self-contradiction, peer conflict, and low-confidence outputs in a dynamic debating environment. Experiments demonstrate that our proposed uncertainty-driven mitigation reliably calibrates the multi-agent system by consistently improving decision accuracy while reducing system disagreement.

The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization

TL;DR

This work tackles debate collapse in multi-agent debate systems by introducing a three-level uncertainty framework—within-agent, between-agent, and system-wide—that correlates with incorrect reasoning and instability. Building on this diagnostic insight, it proposes Uncertainty-Driven Policy Optimization (UDPO), an asymmetric, uncertainty-informed training objective with stability, agreement, and confidence components, plus a clipped update rule and uncertainty-based replay. Empirical results across GSM8K, TruthfulQA, and CommonsenseQA show UDPO markedly improves accuracy while substantially reducing uncertainty, and maintains robustness under adversarial attacks, outperforming standard MAD, MAPPO, and RMAAC baselines. The findings establish uncertainty signals as reliable predictors of debate health and demonstrate a practical mitigation pathway to more robust and calibrated MAD, with potential implications for reliable collaborative reasoning in complex tasks.

Abstract

Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous reasoning. Existing methods lack principled mechanisms to detect or prevent such failures. To address this gap, we first propose a hierarchical metric that quantifies behavioral uncertainty at three levels: intra-agent (individual reasoning uncertainty), inter-agent (interactive uncertainty), and system-level (output uncertainty). Empirical analysis across several benchmarks reveals that our proposed uncertainty quantification reliably indicates system failures, which demonstrates the validity of using them as diagnostic metrics to indicate the system failure. Subsequently, we propose a mitigation strategy by formulating an uncertainty-driven policy optimization to penalize self-contradiction, peer conflict, and low-confidence outputs in a dynamic debating environment. Experiments demonstrate that our proposed uncertainty-driven mitigation reliably calibrates the multi-agent system by consistently improving decision accuracy while reducing system disagreement.
Paper Structure (45 sections, 19 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 45 sections, 19 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of uncertainty quantification at three levels: intra-agent (within a single LLM agent’s reasoning), inter-agent (between a pair of agents during interaction), and system-level (the overall multi-agent system’s decision).
  • Figure 2: Uncertainty distributions on GSM8K for failed vs. successful reasoning. We conduct two-sample $t$-test with the null hypothesis ($H_0$) that the two distributions are identical. A smaller $p$-value (e.g., $p<0.05$) indicates stronger evidence to reject $H_0$, suggesting that the two distributions are statistically different. We also compute Cohen's d cohen2013statistical to quantify the standardized difference between the means of two distributions ($d > 0.8$ indicates a large difference). Results for other datasets are shown in Appendix \ref{['app:uncertainty_distributions']}.
  • Figure 3: Correlation matrix (TruthfulQA) by Pearson's correlation coefficient $r$wiki:Pearson_correlation_coefficient . We also conduct two-sample $t$-test same as in Figure \ref{['fig:raincloud']}. All uncertainty types negatively correlate with accuracy ($r<0, p < 0.001$). Results for other datasets are shown in Appendix \ref{['app:correlation_analysis']}.
  • Figure 4: Uncertainty-driven data selection on TruthfulQA. See Appendix \ref{['app:selective_prediction']} for additional results.
  • Figure 5: Accuracy improvement ($\Delta$%) over Standard MAD by uncertainty level. Results averaged across datasets and models. Our method shows increasing gains with higher uncertainty, while MAPPO and RMAAC plateau or degrade. See Appendix \ref{['app:difficulty']} for $U_{\text{intra}}$- and $U_{\text{inter}}$-based difficulty stratification.
  • ...and 6 more figures