GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion
Tongxuan Liu, Xingyu Wang, Weizhe Huang, Wenjiang Xu, Yuting Zeng, Lei Jiang, Hailong Yang, Jing Li
TL;DR
This paper tackles the token-cost burden of multi-agent debates by introducing GroupDebate, a group-discussion framework that partitions agents into groups, conducts intra-group debates, and exchanges summarized insights between groups across staged rounds. The authors provide formal token-cost analyses showing substantial reductions compared to traditional MAD, along with empirical results across arithmetic, GSM8K, MMLU, and MATH datasets that demonstrate both token savings (up to ~51.7%) and accuracy gains (up to ~25%). They also explore design choices—group composition and intra-group rounds—that influence performance and scalability, and compare GD against single-agent baselines to highlight practical benefits. While GD reduces costs and enhances performance, limitations include unresolved optimal settings for group and stage counts and still higher costs than some single-agent approaches, pointing to avenues for further optimization and theoretical grounding.
Abstract
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abilities such as Chain-of-Thought, Chain-of-Thought with Self-Consistency, Tree-Of-Thoughts, and multi-agent debates. In the context of multi-agent debates, significant performance improvements can be achieved with an increasing number of agents and debate rounds. However, the escalation in the number of agents and debate rounds can drastically raise the tokens cost of debates, thereby limiting the scalability of the multi-agent debate technique. To better harness the advantages of multi-agent debates in logical reasoning tasks, this paper proposes a method to significantly reduce token cost in multi-agent debates. This approach involves dividing all agents into multiple debate groups, with agents engaging in debates within their respective groups and sharing interim debate results between groups. Comparative experiments across multiple datasets have demonstrated that this method can reduce the total tokens by up to 51.7% during debates and while potentially enhancing accuracy by as much as 25%. Our method significantly enhances the performance and efficiency of interactions in the multi-agent debate.
