Table of Contents
Fetching ...

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Qian Zhang, Yan Zheng, Jinyi Liu, Hebin Liang, Lanjun Wang

TL;DR

The paper investigates how role allocation within Multi-Agent Debate (MAD) shapes reasoning performance, revealing role-position effects as a key scaling dimension. It introduces Truth Last as an effective allocation pattern and the Multi-Agent Debate Consistency (MADC) strategy, which uses path-consistency signals to approximate truth without modifying prompts. Across nine LLM families and three reasoning tasks (BBH Logical Deduction, Geometric Shapes, MATH500), Truth Last yields substantial accuracy gains (up to ~22–24% in idealized settings) and MADC provides robust improvements across models and scaling regimes, with gains that grow as rounds and agent counts increase. The work demonstrates that late-position viewpoints often dominate outcomes, shows how to plug MADC into existing MAD frameworks with minimal disruption, and discusses societal implications and safeguards against power concentration in automated debates.

Abstract

Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

TL;DR

The paper investigates how role allocation within Multi-Agent Debate (MAD) shapes reasoning performance, revealing role-position effects as a key scaling dimension. It introduces Truth Last as an effective allocation pattern and the Multi-Agent Debate Consistency (MADC) strategy, which uses path-consistency signals to approximate truth without modifying prompts. Across nine LLM families and three reasoning tasks (BBH Logical Deduction, Geometric Shapes, MATH500), Truth Last yields substantial accuracy gains (up to ~22–24% in idealized settings) and MADC provides robust improvements across models and scaling regimes, with gains that grow as rounds and agent counts increase. The work demonstrates that late-position viewpoints often dominate outcomes, shows how to plug MADC into existing MAD frameworks with minimal disruption, and discusses societal implications and safeguards against power concentration in automated debates.

Abstract

Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.

Paper Structure

This paper contains 31 sections, 5 equations, 11 figures, 11 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparative performance of single-agent and different MAD allocation strategies using the Qwen2.5-7B-Instruct model on BBH's Logical Deduction and Geometric Shapes tasks.
  • Figure 2: In the initial round of the MAD framework, each role independently uses CoT. During the debate round, roles exchange viewpoints in a fully connected manner to update their viewpoints. In the default Fixed strategy, roles speak in a consistent order each round and receive others' viewpoints in a fixed sequence (A)-(E). The Random strategy disrupts both the speaking order and the relative positions during the debate. Green represents correct viewpoints, while orange indicates incorrect viewpoints.
  • Figure 3: Experimental results showing the accuracy metrics across different allocation strategies, with experiments repeated 20 times.
  • Figure 4: Experimental results showing the log-likelihood metrics across different allocation strategies, with experiments repeated 20 times.
  • Figure 5: Experimental results showing the entropy metrics across different allocation strategies, with experiments repeated 20 times.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Definition 1: Debating Path
  • Definition 2: Path Consistency