Table of Contents
Fetching ...

Voting or Consensus? Decision-Making in Multi-Agent Debate

Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp

TL;DR

This work provides a systematic, controlled comparison of seven decision protocols (three consensus and four voting) in multi-agent debates, using fixed discussion parameters to isolate protocol effects. It finds clear task-dependent advantages: consensus excels on knowledge-based tasks, while voting benefits reasoning tasks, with significant improvements over a single-model baseline. To boost answer diversity and performance, it introduces All-Agents Drafting (AAD) and Collective Improvement (CI), achieving up to 7.4% gains, and shows a strong link between diversity and accuracy. The study underscores the importance of decision-making strategies for robust multi-agent reasoning, discusses computational trade-offs, and offers practical guidance for applying these protocols in real-world, high-stakes domains.

Abstract

Much of the success of multi-agent debates depends on carefully choosing the right parameters. The decision-making protocol stands out as it can highly impact final model answers, depending on how decisions are reached. Systematic comparison of decision protocols is difficult because many studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making influences different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time - the decision protocol - to analyze how different methods affect the collaboration between agents and measure differences in knowledge and reasoning tasks. Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks compared to other decision protocols. Increasing the number of agents improves performance, while more discussion rounds before voting reduce it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.

Voting or Consensus? Decision-Making in Multi-Agent Debate

TL;DR

This work provides a systematic, controlled comparison of seven decision protocols (three consensus and four voting) in multi-agent debates, using fixed discussion parameters to isolate protocol effects. It finds clear task-dependent advantages: consensus excels on knowledge-based tasks, while voting benefits reasoning tasks, with significant improvements over a single-model baseline. To boost answer diversity and performance, it introduces All-Agents Drafting (AAD) and Collective Improvement (CI), achieving up to 7.4% gains, and shows a strong link between diversity and accuracy. The study underscores the importance of decision-making strategies for robust multi-agent reasoning, discusses computational trade-offs, and offers practical guidance for applying these protocols in real-world, high-stakes domains.

Abstract

Much of the success of multi-agent debates depends on carefully choosing the right parameters. The decision-making protocol stands out as it can highly impact final model answers, depending on how decisions are reached. Systematic comparison of decision protocols is difficult because many studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making influences different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time - the decision protocol - to analyze how different methods affect the collaboration between agents and measure differences in knowledge and reasoning tasks. Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks compared to other decision protocols. Increasing the number of agents improves performance, while more discussion rounds before voting reduce it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.

Paper Structure

This paper contains 37 sections, 2 equations, 21 figures, 4 tables.

Figures (21)

  • Figure 1: Illustration of voting and consensus-based decision protocols used in this study.
  • Figure 2: Task performance$\pm$std for seven decision protocols (voting and consensus-based) on six tasks (knowledge and reasoning) based on agents with Llama 8B. Bold indicates the highest results per dataset. Standard deviation for three runs.
  • Figure 3: F1$\pm$std of different decision protocols on SQuAD 2.0 divided into three ablation groups: (middle) samples with an answer in the context (Sample has Answer), (right) samples with no answer in the context (Sample has no Answer), and (left) the combination of both (All Samples). Standard deviation over three runs.
  • Figure 4: Accuracy$\pm$std on StrategyQA when the agents have to talk for a given number of rounds before they are allowed to vote using the simple voting decision protocol. Standard deviation over three runs.
  • Figure 5: Accuracy$\pm$std on StrategyQA with a different number of agents participating in the discussion. The final answer is created using the simple voting decision protocol. Standard deviation over three runs.
  • ...and 16 more figures