On the Uncertainty of Large Language Model-Based Multi-Agent Systems
Yuxuan Zhao, Sijia Chen, Ningxin Su
TL;DR
This paper investigates why MAS built on open-source LLMs succeed or fail by systematically analyzing uncertainty dynamics across token-, trajectory-, and round-levels in six benchmarks and four coordination topologies. The authors find that single-agent systems often outperform MAS (43.3% of cases), and that the critical determinants of MAS performance arise in the first interaction round, with peak entropy generally detrimental. They distill three principles—Certainty Preference, Base Uncertainty, and Task Awareness—and introduce the Entropy Judger, a lightweight classifier that predicts per-sample correctness from entropy traces and enables pass@$k$ selection without ground-truth labels. The work emphasizes that uncertainty is a principled lens for diagnosing MAS failures and guiding architectural choices, offering both theoretical insights and practical tooling for robust multi-agent reasoning with LLMs. It further shows that RL-tuned bases can invert typical entropy effects, enabling MAS to outperform SAS under certain conditions and tasks.
Abstract
Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of uncertainty, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies and six benchmark tasks. By analyzing 245 features spanning token-, trajectory-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3% of cases, and that uncertainty dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) Certainty Preference: reducing uncertainty at any stage for any agent is critical for guaranteeing correct solutions; 2) Base Uncertainty: base models with lower entropy during problem-solving directly benefit MAS performance; and 3) Task Awareness: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the Entropy Judger, to select solutions from MAS's pass@k results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at https://github.com/AgenticFinLab/multiagent-entropy.
