Voting or Consensus? Decision-Making in Multi-Agent Debate
Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp
TL;DR
This work provides a systematic, controlled comparison of seven decision protocols (three consensus and four voting) in multi-agent debates, using fixed discussion parameters to isolate protocol effects. It finds clear task-dependent advantages: consensus excels on knowledge-based tasks, while voting benefits reasoning tasks, with significant improvements over a single-model baseline. To boost answer diversity and performance, it introduces All-Agents Drafting (AAD) and Collective Improvement (CI), achieving up to 7.4% gains, and shows a strong link between diversity and accuracy. The study underscores the importance of decision-making strategies for robust multi-agent reasoning, discusses computational trade-offs, and offers practical guidance for applying these protocols in real-world, high-stakes domains.
Abstract
Much of the success of multi-agent debates depends on carefully choosing the right parameters. The decision-making protocol stands out as it can highly impact final model answers, depending on how decisions are reached. Systematic comparison of decision protocols is difficult because many studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making influences different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time - the decision protocol - to analyze how different methods affect the collaboration between agents and measure differences in knowledge and reasoning tasks. Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks compared to other decision protocols. Increasing the number of agents improves performance, while more discussion rounds before voting reduce it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.
