Can AI Agents Agree?

Frédéric Berdoz; Leonardo Rugli; Roger Wattenhofer

Can AI Agents Agree?

Frédéric Berdoz, Leonardo Rugli, Roger Wattenhofer

TL;DR

It is found that valid agreement is not reliable even in benign settings and degrades as group size grows and degrades as group size grows, raising caution for deployments that rely on robust coordination.

Abstract

Large language models are increasingly deployed as cooperating agents, yet their behavior in adversarial consensus settings has not been systematically studied. We evaluate LLM-based agents on a Byzantine consensus game over scalar values using a synchronous all-to-all simulation. We test consensus in a no-stake setting where agents have no preferences over the final value, so evaluation focuses on agreement rather than value optimality. Across hundreds of simulations spanning model sizes, group sizes, and Byzantine fractions, we find that valid agreement is not reliable even in benign settings and degrades as group size grows. Introducing a small number of Byzantine agents further reduces success. Failures are dominated by loss of liveness, such as timeouts and stalled convergence, rather than subtle value corruption. Overall, the results suggest that reliable agreement is not yet a dependable emergent capability of current LLM-agent groups even in no-stake settings, raising caution for deployments that rely on robust coordination.

Can AI Agents Agree?

TL;DR

Abstract

Paper Structure (17 sections, 14 figures, 1 algorithm)

This paper contains 17 sections, 14 figures, 1 algorithm.

Introduction
Related Work
Method
Setting.
Protocol.
Termination and outcomes.
Threat model.
LLM agent implementation.
Evaluation.
Experiments and Results
Consensus without Byzantine agents
Consensus with Byzantine agents
Discussion and Conclusion
Appendix
Reproducibility
...and 2 more sections

Figures (14)

Figure 1: Byzantine consensus game with honest and byzantine LLM agents on a synchronous all-to-all network in one round of interaction. The highlighted agent $i$ broadcasts a scalar proposal and justification, receives messages from peers, and emits a termination decision $\{\texttt{vote}, \texttt{continue}\}$. For clarity, only a subset of message arrows is shown.
Figure 2: Consensus performance without Byzantine agents $B=0$ for Qwen3-8B and Qwen3-14B across $N \in \{4,8,16\}$ and two prompt variants (with vs. without mentioning possible Byzantine agents). Error bars show 95% Wilson confidence intervals. Qwen3-14B reaches valid consensus more often, adversary-free prompts improve liveness, and larger groups slow and weaken consensus.
Figure 3: Effect of Byzantine agents on Qwen3-14B consensus with eight honest agents. Bars show the distribution of valid vs. invalid consensus outcomes over 25 runs per configuration; missing bars correspond to 0% consensus (all timeouts).
Figure 4: Representative proposal trajectories for Qwen3-14B with eight honest agents. Top row: prompts explicitly state that no Byzantine agents exist. Bottom row: prompts include Byzantine agents and warn that Byzantine peers may exist. Each panel shows honest agents’ scalar proposals over rounds, with horizontal lines marking the initial honest range and a vertical line marking termination.
Figure 5: Byzantine system prompt defining the global adversarial behavior and strategy for Byzantine agents. Variable fields include: agent ID, group size, and allowed adversarial actions.
...and 9 more figures

Can AI Agents Agree?

TL;DR

Abstract

Can AI Agents Agree?

Authors

TL;DR

Abstract

Table of Contents

Figures (14)