Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Qian Zhang; Yan Zheng; Jinyi Liu; Hebin Liang; Lanjun Wang

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Qian Zhang, Yan Zheng, Jinyi Liu, Hebin Liang, Lanjun Wang

TL;DR

The paper investigates how role allocation within Multi-Agent Debate (MAD) shapes reasoning performance, revealing role-position effects as a key scaling dimension. It introduces Truth Last as an effective allocation pattern and the Multi-Agent Debate Consistency (MADC) strategy, which uses path-consistency signals to approximate truth without modifying prompts. Across nine LLM families and three reasoning tasks (BBH Logical Deduction, Geometric Shapes, MATH500), Truth Last yields substantial accuracy gains (up to ~22–24% in idealized settings) and MADC provides robust improvements across models and scaling regimes, with gains that grow as rounds and agent counts increase. The work demonstrates that late-position viewpoints often dominate outcomes, shows how to plug MADC into existing MAD frameworks with minimal disruption, and discusses societal implications and safeguards against power concentration in automated debates.

Abstract

Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

TL;DR

Abstract

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)

Theorems & Definitions (2)