Table of Contents
Fetching ...

S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency

Yuting Zeng, Weizhe Huang, Lei Jiang, Tongxuan Liu, Xitai Jin, Chen Tianying Tiana, Jing Li, Xiaohua Xu

TL;DR

This work tackles the token-cost inefficiency of Multi-agent Debate (MAD) in large language models by introducing Selective Sparse MAD (S^2-MAD) with a Decision-Making Mechanism. The mechanism combines similarity calculation, redundancy filtering, and conditional participation to prune redundant information and enable selective engagement among agents, substantially reducing token exchanges. Empirical results across five reasoning benchmarks and multiple models show token-cost reductions up to 94.5% with less than 2% accuracy loss, confirming that efficiency gains can be achieved without sacrificing performance. Theoretical analysis and ablation studies illuminate how grouping, similarity thresholds, and early stopping contribute to efficiency, suggesting promising directions for robust, scalable multi-agent reasoning.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various natural language processing (NLP) scenarios, but they still face challenges when handling complex arithmetic and logical reasoning tasks. While Chain-Of-Thought (CoT) reasoning, self-consistency (SC) and self-correction strategies have attempted to guide models in sequential, multi-step reasoning, Multi-agent Debate (MAD) has emerged as a viable approach for enhancing the reasoning capabilities of LLMs. By increasing both the number of agents and the frequency of debates, the performance of LLMs improves significantly. However, this strategy results in a significant increase in token costs, presenting a barrier to scalability. To address this challenge, we introduce a novel sparsification strategy designed to reduce token costs within MAD. This approach minimizes ineffective exchanges of information and unproductive discussions among agents, thereby enhancing the overall efficiency of the debate process. We conduct comparative experiments on multiple datasets across various models, demonstrating that our approach significantly reduces the token costs in MAD to a considerable extent. Specifically, compared to MAD, our approach achieves an impressive reduction of up to 94.5\% in token costs while maintaining performance degradation below 2.0\%.

S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency

TL;DR

This work tackles the token-cost inefficiency of Multi-agent Debate (MAD) in large language models by introducing Selective Sparse MAD (S^2-MAD) with a Decision-Making Mechanism. The mechanism combines similarity calculation, redundancy filtering, and conditional participation to prune redundant information and enable selective engagement among agents, substantially reducing token exchanges. Empirical results across five reasoning benchmarks and multiple models show token-cost reductions up to 94.5% with less than 2% accuracy loss, confirming that efficiency gains can be achieved without sacrificing performance. Theoretical analysis and ablation studies illuminate how grouping, similarity thresholds, and early stopping contribute to efficiency, suggesting promising directions for robust, scalable multi-agent reasoning.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various natural language processing (NLP) scenarios, but they still face challenges when handling complex arithmetic and logical reasoning tasks. While Chain-Of-Thought (CoT) reasoning, self-consistency (SC) and self-correction strategies have attempted to guide models in sequential, multi-step reasoning, Multi-agent Debate (MAD) has emerged as a viable approach for enhancing the reasoning capabilities of LLMs. By increasing both the number of agents and the frequency of debates, the performance of LLMs improves significantly. However, this strategy results in a significant increase in token costs, presenting a barrier to scalability. To address this challenge, we introduce a novel sparsification strategy designed to reduce token costs within MAD. This approach minimizes ineffective exchanges of information and unproductive discussions among agents, thereby enhancing the overall efficiency of the debate process. We conduct comparative experiments on multiple datasets across various models, demonstrating that our approach significantly reduces the token costs in MAD to a considerable extent. Specifically, compared to MAD, our approach achieves an impressive reduction of up to 94.5\% in token costs while maintaining performance degradation below 2.0\%.

Paper Structure

This paper contains 35 sections, 6 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Redundant Viewpoints Exchange between Agents. The perspectives of Agent 1 and Agent 3 demonstrate a notable similarity. Throughout the debate, these viewpoints are exchanged with Agent 2, who receives these akin and repetitive viewpoints.
  • Figure 2: Process of ${\text{S}^2\text{-MAD}}$. The ${\text{S}^2\text{-MAD}}$ includes three stages: all agents generate initial responses independently at the first round and participate in group discussions to reach consensus under a Decision-Making Mechanism, which comprises: (1) Similarity calculation module accesses the similarity of responses either between or within groups. (2) Redundancy filter module filters redundant information, retaining only unique information that differs from the agent's own perspective. (3) Conditional participation module decide to participate in debate or not.
  • Figure 3: The relationship between the threshold$\tau$, ACC, and Token Cost on the GSM8K and MATH datasets.
  • Figure 4: Scaling study of Agents and Rounds.
  • Figure 5: Scaling Study of Token Cost.
  • ...and 1 more figures