Table of Contents
Fetching ...

GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion

Tongxuan Liu, Xingyu Wang, Weizhe Huang, Wenjiang Xu, Yuting Zeng, Lei Jiang, Hailong Yang, Jing Li

TL;DR

This paper tackles the token-cost burden of multi-agent debates by introducing GroupDebate, a group-discussion framework that partitions agents into groups, conducts intra-group debates, and exchanges summarized insights between groups across staged rounds. The authors provide formal token-cost analyses showing substantial reductions compared to traditional MAD, along with empirical results across arithmetic, GSM8K, MMLU, and MATH datasets that demonstrate both token savings (up to ~51.7%) and accuracy gains (up to ~25%). They also explore design choices—group composition and intra-group rounds—that influence performance and scalability, and compare GD against single-agent baselines to highlight practical benefits. While GD reduces costs and enhances performance, limitations include unresolved optimal settings for group and stage counts and still higher costs than some single-agent approaches, pointing to avenues for further optimization and theoretical grounding.

Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abilities such as Chain-of-Thought, Chain-of-Thought with Self-Consistency, Tree-Of-Thoughts, and multi-agent debates. In the context of multi-agent debates, significant performance improvements can be achieved with an increasing number of agents and debate rounds. However, the escalation in the number of agents and debate rounds can drastically raise the tokens cost of debates, thereby limiting the scalability of the multi-agent debate technique. To better harness the advantages of multi-agent debates in logical reasoning tasks, this paper proposes a method to significantly reduce token cost in multi-agent debates. This approach involves dividing all agents into multiple debate groups, with agents engaging in debates within their respective groups and sharing interim debate results between groups. Comparative experiments across multiple datasets have demonstrated that this method can reduce the total tokens by up to 51.7% during debates and while potentially enhancing accuracy by as much as 25%. Our method significantly enhances the performance and efficiency of interactions in the multi-agent debate.

GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion

TL;DR

This paper tackles the token-cost burden of multi-agent debates by introducing GroupDebate, a group-discussion framework that partitions agents into groups, conducts intra-group debates, and exchanges summarized insights between groups across staged rounds. The authors provide formal token-cost analyses showing substantial reductions compared to traditional MAD, along with empirical results across arithmetic, GSM8K, MMLU, and MATH datasets that demonstrate both token savings (up to ~51.7%) and accuracy gains (up to ~25%). They also explore design choices—group composition and intra-group rounds—that influence performance and scalability, and compare GD against single-agent baselines to highlight practical benefits. While GD reduces costs and enhances performance, limitations include unresolved optimal settings for group and stage counts and still higher costs than some single-agent approaches, pointing to avenues for further optimization and theoretical grounding.

Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abilities such as Chain-of-Thought, Chain-of-Thought with Self-Consistency, Tree-Of-Thoughts, and multi-agent debates. In the context of multi-agent debates, significant performance improvements can be achieved with an increasing number of agents and debate rounds. However, the escalation in the number of agents and debate rounds can drastically raise the tokens cost of debates, thereby limiting the scalability of the multi-agent debate technique. To better harness the advantages of multi-agent debates in logical reasoning tasks, this paper proposes a method to significantly reduce token cost in multi-agent debates. This approach involves dividing all agents into multiple debate groups, with agents engaging in debates within their respective groups and sharing interim debate results between groups. Comparative experiments across multiple datasets have demonstrated that this method can reduce the total tokens by up to 51.7% during debates and while potentially enhancing accuracy by as much as 25%. Our method significantly enhances the performance and efficiency of interactions in the multi-agent debate.
Paper Structure (36 sections, 6 equations, 11 figures, 4 tables, 3 algorithms)

This paper contains 36 sections, 6 equations, 11 figures, 4 tables, 3 algorithms.

Figures (11)

  • Figure 1: Comparison of Token Cost and Accuracy Under Different Combinations of Agents and Rounds. The numbers in parentheses corresponding to each circle represent the pair of agent number and round number. The size/color of the circle represents the number of API calls, indicating that the larger the circle, the more times the OpenAI API is called.
  • Figure 2: An Example of Multi-agent Debate Among Three Agents with Two Rounds.
  • Figure 3: Token Cost Under Different Numbers of Agents and Rounds. Figures in the first row illustrate the token cost with variations in agents under the premise of 4 rounds. Figures in the second row depict the token cost with changes in rounds under the condition of 4 agents.
  • Figure 4: An Example of GroupDebate. 4 agents are divided into 2 groups and the GroupDebate process comprises two stages, with each stage involving two rounds of intra-group debate.
  • Figure 5: Comparison of Token Cost and Accuracy Between GD and MAD under Different Agents and Rounds. The notation (5,4) signifies 5 agents with 4 rounds.
  • ...and 6 more figures