Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
Sugyeong Eo, Hyeonseok Moon, Evelyn Hayoon Zi, Chanjun Park, Heuiseok Lim
TL;DR
The paper tackles the high computational cost and error-propagation risks of multiagent LLM debate systems. It introduces DOWN, an adaptive framework that triggers debate only when the initial response confidence is low, using a confidence-guided, two-round refinement process and either voting or judge-based finalization. Across MUSR and StrategyQA, DOWN achieves up to sixfold efficiency gains while maintaining or improving accuracy, and analysis shows it mitigates cascading errors and generalizes to mixed-model setups and broader domains. The findings demonstrate that selective, confidence-informed debate can deliver high-performance reasoning with substantially reduced resource consumption, offering a scalable alternative to full-debate approaches.
Abstract
Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs). Despite improvements in reasoning, the approach introduces substantial computational overhead resulting from iterative agent interactions. Furthermore, engaging in unnecessary debates increases the risk of generating erroneous responses. To address these challenges, we propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response. Debate is activated only for queries requiring further deliberation, during which agents refine their outputs by referencing peer responses and associated confidence scores. Evaluations on benchmarks show that DOWN improves efficiency by up to six times while preserving or even outperforming the performance of existing methods. Further analysis indicates that DOWN effectively mitigates the risk of error propagation stemming from the unnecessary debate process. These findings demonstrate the effectiveness of our approach in delivering high-performance LLM solutions at a lower computational cost.
