Table of Contents
Fetching ...

SID: Multi-LLM Debate Driven by Self Signals

Xuhang Chen, Zhifan Song, Deyi Ji, Shuo Gao, Lanyun Zhu

TL;DR

SID introduces a self-signal-driven approach to multi-LLM debate by exploiting internal generation signals rather than external judgments. It combines model-level confidence for an early-exit gate with token-level semantic focus for attention-guided compression, yielding improved accuracy and up to 40% token reduction across diverse benchmarks. Empirical results on LLMs and MLLMs (e.g., MMLUpro, Math, ScienceQA, MMStar) demonstrate SID's effectiveness and efficiency over existing MAD methods, with ablations confirming the importance of both mechanisms. The work highlights the viability of internal belief signals to orchestrate collaborative reasoning among AI agents, offering practical benefits for deployment and reproducibility.

Abstract

Large Language Models (LLMs) have exhibited impressive capabilities across diverse application domains. Recent work has explored Multi-LLM Agent Debate (MAD) as a way to enhance performance by enabling multiple LLMs to discuss and refine responses iteratively. Nevertheless, existing MAD methods predominantly focus on utilizing external structures, such as debate graphs, using LLM-as-a-Judge, while neglecting the application of self signals, such as token logits and attention, that arise during generation. This omission leads to redundant computation and potential performance degradation. In this paper, we shift the focus to the self signals of multi-LLM debate and introduce a Self-Signals Driven Multi-LLM Debate (SID), which leverages two types of self-signals: model-level confidence and token-level semantic focus, to adaptively guide the debate process. Our approach enables high-confidence agents to exit early at the model level and compress the redundant debate contents based on the attention mechanism. We evaluate our method on various LLMs and Multimodal LLMs across multiple challenging benchmarks. Experimental results demonstrate that our method not only outperforms existing MAD techniques in accuracy but also reduces token consumption, highlighting the effectiveness of utilizing self signals in enhancing both the performance and efficiency of multi-agent debate systems. Our code will be available at~\href{https://github.com/xuhang2019/SID}{\texttt{https://github.com/xuhang2019/SID}}.

SID: Multi-LLM Debate Driven by Self Signals

TL;DR

SID introduces a self-signal-driven approach to multi-LLM debate by exploiting internal generation signals rather than external judgments. It combines model-level confidence for an early-exit gate with token-level semantic focus for attention-guided compression, yielding improved accuracy and up to 40% token reduction across diverse benchmarks. Empirical results on LLMs and MLLMs (e.g., MMLUpro, Math, ScienceQA, MMStar) demonstrate SID's effectiveness and efficiency over existing MAD methods, with ablations confirming the importance of both mechanisms. The work highlights the viability of internal belief signals to orchestrate collaborative reasoning among AI agents, offering practical benefits for deployment and reproducibility.

Abstract

Large Language Models (LLMs) have exhibited impressive capabilities across diverse application domains. Recent work has explored Multi-LLM Agent Debate (MAD) as a way to enhance performance by enabling multiple LLMs to discuss and refine responses iteratively. Nevertheless, existing MAD methods predominantly focus on utilizing external structures, such as debate graphs, using LLM-as-a-Judge, while neglecting the application of self signals, such as token logits and attention, that arise during generation. This omission leads to redundant computation and potential performance degradation. In this paper, we shift the focus to the self signals of multi-LLM debate and introduce a Self-Signals Driven Multi-LLM Debate (SID), which leverages two types of self-signals: model-level confidence and token-level semantic focus, to adaptively guide the debate process. Our approach enables high-confidence agents to exit early at the model level and compress the redundant debate contents based on the attention mechanism. We evaluate our method on various LLMs and Multimodal LLMs across multiple challenging benchmarks. Experimental results demonstrate that our method not only outperforms existing MAD techniques in accuracy but also reduces token consumption, highlighting the effectiveness of utilizing self signals in enhancing both the performance and efficiency of multi-agent debate systems. Our code will be available at~\href{https://github.com/xuhang2019/SID}{\texttt{https://github.com/xuhang2019/SID}}.

Paper Structure

This paper contains 34 sections, 8 equations, 22 figures, 5 tables, 2 algorithms.

Figures (22)

  • Figure 1: Self-Signal Driven Debate (SID)
  • Figure 2: (a) Accuracy and token ratio comparison across strategies in MAD vs SID. (b) Performance with more debate rounds in LLM and MLLM. (c) Significance tests on model-level confidence signals. C means the correct group, and W means the wrong group. Statistical significance is indicated as follows: $p<0.05$(*), $p<0.01$(**), and $p<0.001$(***): (d) Answer correction flow in the MAD vs SID setting. (e) Ablation of $\text{top-}p$ and (f) $\alpha$ on accuracy and token ratio.
  • Figure 3: Case study of SID's debate process. (Left) On MMLUpro, SID exits early for a simple arithmetic question with high confidence but fails on a complex physics question with low confidence. (Right) Three agents initially err but converge to the correct answer through debate guided by token-level semantic focus from adaptively compressed content.
  • Figure 4: Details of reasoning augmentation prompt.
  • Figure 5: Running time comparison on math datasets with LLaMA3.1-8B, running on single A100 80GB GPU
  • ...and 17 more figures