Table of Contents
Fetching ...

Roundtable Policy: Confidence-Weighted-Consensus Aggregation Improves Multi-Agent-System Reasoning

Yu Yao, Jiayi Dong, Yang Yang, Ju Li, Yilun Du

TL;DR

Roundtable Policy addresses the challenge of aggregating heterogeneous reasoning paths in multi-agent systems for scientific tasks. It introduces a confidence-weighted memory to weigh agent contributions at inference time, enabling auditable consensus without retraining. Empirical results on ScienceEval and ScienceNarrative show notable gains across cross-domain and long-context tasks, with analyses of grader bias and inter-grader agreement. The work argues for reliability modeling and consensus formation as a core paradigm for future multi-agent collaboration.

Abstract

Multi-agent systems have demonstrated exceptional performance in downstream tasks beyond diverse single agent baselines. A growing body of work has explored ways to improve their reasoning and collaboration, from vote, debate, to complex interaction protocols. However, it still remains opaque why specific choice would be preferred in multi-agent systems. Inspired by the decision-making mechanism of democratic committees and The Society of Mind, we introduce Roundtable Policy, an inference-time reasoning framework for multi-agent systems that performs inference through the weighted consensus of multiple LLMs. Through extensive experiments, we demonstrate its that this approach significantly enhances reasoning in complex heterogeneous scientific tasks. Roundtable Policy emphasizes structured and interpretable inference rather than opaque convergence, while requires only black-box access and uniform procedures, making it broadly applicable to diverse multi-agent systems.

Roundtable Policy: Confidence-Weighted-Consensus Aggregation Improves Multi-Agent-System Reasoning

TL;DR

Roundtable Policy addresses the challenge of aggregating heterogeneous reasoning paths in multi-agent systems for scientific tasks. It introduces a confidence-weighted memory to weigh agent contributions at inference time, enabling auditable consensus without retraining. Empirical results on ScienceEval and ScienceNarrative show notable gains across cross-domain and long-context tasks, with analyses of grader bias and inter-grader agreement. The work argues for reliability modeling and consensus formation as a core paradigm for future multi-agent collaboration.

Abstract

Multi-agent systems have demonstrated exceptional performance in downstream tasks beyond diverse single agent baselines. A growing body of work has explored ways to improve their reasoning and collaboration, from vote, debate, to complex interaction protocols. However, it still remains opaque why specific choice would be preferred in multi-agent systems. Inspired by the decision-making mechanism of democratic committees and The Society of Mind, we introduce Roundtable Policy, an inference-time reasoning framework for multi-agent systems that performs inference through the weighted consensus of multiple LLMs. Through extensive experiments, we demonstrate its that this approach significantly enhances reasoning in complex heterogeneous scientific tasks. Roundtable Policy emphasizes structured and interpretable inference rather than opaque convergence, while requires only black-box access and uniform procedures, making it broadly applicable to diverse multi-agent systems.

Paper Structure

This paper contains 55 sections, 12 equations, 40 figures, 12 tables, 1 algorithm.

Figures (40)

  • Figure 1: Motivation for an inference-time reasoning framework with structured aggregation. Qualitative illustration of the limitations of existing multi-agent systems and motivation for Roundtable Policy. Left: Voting-based aggregation lacks memory and treats all agents equally, leading to majority bias when partial but confident opinions dominate. Middle: Debate-based interaction relies on transient conversational dynamics and often converges to rhetorically balanced statements. Right:Roundtable Policy introduces a structured, long-term memory of agents' reliability and multi-agent consensus, producing coherent reasoning.
  • Figure 2: Roundtable Policy is an inference-time reasoning framework without retraining or finetuning base models.
  • Figure 3: Example of simplified ScienceNarrative. A detailed case is demonstrated in \ref{['fig:singletask_demo']}.
  • Figure 4: Example of a subtask of ScienceEval. A detailed case is demonstrated in \ref{['fig:multitask_demo']}.
  • Figure 5: Roundtable Policy with ablated components.
  • ...and 35 more figures