S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency
Yuting Zeng, Weizhe Huang, Lei Jiang, Tongxuan Liu, Xitai Jin, Chen Tianying Tiana, Jing Li, Xiaohua Xu
TL;DR
This work tackles the token-cost inefficiency of Multi-agent Debate (MAD) in large language models by introducing Selective Sparse MAD (S^2-MAD) with a Decision-Making Mechanism. The mechanism combines similarity calculation, redundancy filtering, and conditional participation to prune redundant information and enable selective engagement among agents, substantially reducing token exchanges. Empirical results across five reasoning benchmarks and multiple models show token-cost reductions up to 94.5% with less than 2% accuracy loss, confirming that efficiency gains can be achieved without sacrificing performance. Theoretical analysis and ablation studies illuminate how grouping, similarity thresholds, and early stopping contribute to efficiency, suggesting promising directions for robust, scalable multi-agent reasoning.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across various natural language processing (NLP) scenarios, but they still face challenges when handling complex arithmetic and logical reasoning tasks. While Chain-Of-Thought (CoT) reasoning, self-consistency (SC) and self-correction strategies have attempted to guide models in sequential, multi-step reasoning, Multi-agent Debate (MAD) has emerged as a viable approach for enhancing the reasoning capabilities of LLMs. By increasing both the number of agents and the frequency of debates, the performance of LLMs improves significantly. However, this strategy results in a significant increase in token costs, presenting a barrier to scalability. To address this challenge, we introduce a novel sparsification strategy designed to reduce token costs within MAD. This approach minimizes ineffective exchanges of information and unproductive discussions among agents, thereby enhancing the overall efficiency of the debate process. We conduct comparative experiments on multiple datasets across various models, demonstrating that our approach significantly reduces the token costs in MAD to a considerable extent. Specifically, compared to MAD, our approach achieves an impressive reduction of up to 94.5\% in token costs while maintaining performance degradation below 2.0\%.
