Table of Contents
Fetching ...

Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks

Wentao Deng, Jiahuan Pei, Zhiwei Xu, Zhaochun Ren, Zhumin Chen, Pengjie Ren

TL;DR

This work introduces Belief-Calibrated Consensus Seeking (BCCS) to stabilize multi-agent NLP consensus by integrating system-internal beliefs into consensus judgments and by selectively connecting agents through Collaborator Assignment (CA) and Leader Selection (LS). Theoretical results establish conditions for stable consensus, notably that collaboration with both supportive and conflicting peers and leadership from high-belief agents promote convergence. Empirically, BCCS yields consistent accuracy gains on MATH ($+2.23\%$) and MMLU ($+3.95\%$) over strong baselines, with ablations confirming the contribution of each module. The approach showcases improved robustness and scalability across model sizes and tasks, with open-source code and a discussion of broader societal considerations and limitations.

Abstract

A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2.23% and 3.95% of accuracy on challenging tasks, respectively. Our code and data are available at https://github.com/dengwentao99/BCCS.

Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks

TL;DR

This work introduces Belief-Calibrated Consensus Seeking (BCCS) to stabilize multi-agent NLP consensus by integrating system-internal beliefs into consensus judgments and by selectively connecting agents through Collaborator Assignment (CA) and Leader Selection (LS). Theoretical results establish conditions for stable consensus, notably that collaboration with both supportive and conflicting peers and leadership from high-belief agents promote convergence. Empirically, BCCS yields consistent accuracy gains on MATH () and MMLU () over strong baselines, with ablations confirming the contribution of each module. The approach showcases improved robustness and scalability across model sizes and tasks, with open-source code and a discussion of broader societal considerations and limitations.

Abstract

A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2.23% and 3.95% of accuracy on challenging tasks, respectively. Our code and data are available at https://github.com/dengwentao99/BCCS.

Paper Structure

This paper contains 56 sections, 4 theorems, 19 equations, 11 figures, 15 tables, 1 algorithm.

Key Result

Theorem 3.2

Let $\{x_i^k\}_{i=1}^n$ denote the opinions and $\{b_i^k\}_{i=1}^n$ denote the beliefs of a MAS with $n$ agents at the $k$-th step of collaboration. The collaboration between agents satisfies the following properties:

Figures (11)

  • Figure 1: Comparison between previous consensus seeking methods and our proposed framework. (a) Existing consensus seeking methods. (b) Belief-Calibrated Consensus Seeking (BCCS).
  • Figure 2: An illustration of the MAS in NLP tasks, where each agent generates an answer $x_i^k$ along with its reasoning process $e_i^k$, the belief $b_i^k$ of $a_i$ is the generation probability.
  • Figure 3: An illustration of the Belief-Calibrated Consensus Seeking (BCCS) framework. The arrows represent the workflows. After obtaining opinion groups, the BCCJ module judges the consensus state of MAS. If MAS reaches partial consensus, the CA module estimates the conflict levels between each two opinion groups through conflict scores, then assigns the collaborators for agents in each opinion group. If MAS reaches no consensus, the LS module selects leaders for each opinion group. The processes above iterate until reaching full consensus or maximum iteration number.
  • Figure 4: (a) and (b): the results of the $CL$, $SCL$ and accuracy for each supportive-to-conflicting collaboration ratio (including "0:7", "6:1" and "7:0"). (c) and (d): the results of the $SCR$ and accuracy for the lowest, random and highest leaders' beliefs (denoted as "L", "R", "H"). (e): the analysis of the parameter size, the x-axis denotes the LLM for comparison.
  • Figure 5: The performance of BCCS with different agent numbers $n$.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Definition 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem B.1
  • proof
  • Theorem B.1
  • proof