Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Yuzhe Zhang; Feiran Liu; Yi Shan; Xinyi Huang; Xin Yang; Yueqi Zhu; Xuxin Cheng; Cao Liu; Ke Zeng; Terry Jingchen Zhang; Wenyuan Jiang

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Yuzhe Zhang, Feiran Liu, Yi Shan, Xinyi Huang, Xin Yang, Yueqi Zhu, Xuxin Cheng, Cao Liu, Ke Zeng, Terry Jingchen Zhang, Wenyuan Jiang

TL;DR

Silo-Bench, a role-agnostic benchmark of 30 algorithmic tasks across three communication complexity levels, is introduced, demonstrating that naively scaling agent count cannot circumvent context limitations, and providing a foundation for tracking progress toward genuinely collaborative multi-agent systems.

Abstract

Large language models are increasingly deployed in multi-agent systems to overcome context limitations by distributing information across agents. Yet whether agents can reliably compute with distributed information -- rather than merely exchange it -- remains an open question. We introduce Silo-Bench, a role-agnostic benchmark of 30 algorithmic tasks across three communication complexity levels, evaluating 54 configurations over 1,620 experiments. Our experiments expose a fundamental Communication-Reasoning Gap: agents spontaneously form task-appropriate coordination topologies and exchange information actively, yet systematically fail to synthesize distributed state into correct answers. The failure is localized to the reasoning-integration stage -- agents often acquire sufficient information but cannot integrate it. This coordination overhead compounds with scale, eventually eliminating parallelization gains entirely. These findings demonstrate that naively scaling agent count cannot circumvent context limitations, and Silo-Bench provides a foundation for tracking progress toward genuinely collaborative multi-agent systems.

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

TL;DR

Abstract

Paper Structure (62 sections, 5 equations, 7 figures, 18 tables)

This paper contains 62 sections, 5 equations, 7 figures, 18 tables.

Introduction
Related Work
Context Limitations and Distributed Reasoning.
Multi-Agent Architectures and Role-Agnosticism.
Silo-Bench
Task Space
Level I: Aggregation ($\mathcal{O}(N)$ communication).
Level II: Mesh Network ($\mathcal{O}(N)$ communication).
Level III: Global Shuffle ($\mathcal{O}(N \log N)$ to $\mathcal{O}(N^2)$ communication).
Task Construction Pipeline.
Evaluation Metrics
Success Rate ($\mathcal{S}$).
Partial Correctness Score ($\mathcal{P}$).
Token Consumption ($\mathcal{C}$).
Communication Density ($\mathcal{D}$).
...and 47 more sections

Figures (7)

Figure 1: Pipeline of Silo-Bench. Global information is partitioned across $N$ agents, each holding only local data. Agents must communicate through the provided protocol to reconstruct global truth. Success requires effective collaboration strategies. This is an example of the III-21 Distributed Sort (Appendix \ref{['app:tasks']}.)
Figure 2: Three complexity levels in Silo-Bench characterized by their communication patterns. Level I (Aggregation): A central agent collects data from all peers via a star topology. Level II (Mesh Network): Agents exchange information with immediate neighbors through pairwise communication. Level III (Global Shuffle): All agents must communicate with every other agent, requiring full mesh connectivity.
Figure 3: The three communication protocols employed in Silo-Bench.
Figure 4: Scaling behavior across agent counts. (a) Success rates decline for all models as team size increases, with sharp drops beyond $N=20$. (b) Token consumption scales roughly linearly with agent count. (c) Communication density decreases at scale, suggesting coordination sparsification.
Figure 5: Success rate by difficulty level.
...and 2 more figures

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

TL;DR

Abstract

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (7)