Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks
Wentao Deng, Jiahuan Pei, Zhiwei Xu, Zhaochun Ren, Zhumin Chen, Pengjie Ren
TL;DR
This work introduces Belief-Calibrated Consensus Seeking (BCCS) to stabilize multi-agent NLP consensus by integrating system-internal beliefs into consensus judgments and by selectively connecting agents through Collaborator Assignment (CA) and Leader Selection (LS). Theoretical results establish conditions for stable consensus, notably that collaboration with both supportive and conflicting peers and leadership from high-belief agents promote convergence. Empirically, BCCS yields consistent accuracy gains on MATH ($+2.23\%$) and MMLU ($+3.95\%$) over strong baselines, with ablations confirming the contribution of each module. The approach showcases improved robustness and scalability across model sizes and tasks, with open-source code and a discussion of broader societal considerations and limitations.
Abstract
A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2.23% and 3.95% of accuracy on challenging tasks, respectively. Our code and data are available at https://github.com/dengwentao99/BCCS.
