Language Model Teams as Distributed Systems

Elizabeth Mieczkowski; Katherine M. Collins; Ilia Sucholutsky; Natalia Vélez; Thomas L. Griffiths

Language Model Teams as Distributed Systems

Elizabeth Mieczkowski, Katherine M. Collins, Ilia Sucholutsky, Natalia Vélez, Thomas L. Griffiths

Abstract

Large language models (LLMs) are growing increasingly capable, prompting recent interest in LLM teams. Yet, despite increased deployment of LLM teams at scale, we lack a principled framework for addressing key questions such as when a team is helpful, how many agents to use, how structure impacts performance -- and whether a team is better than a single agent. Rather than designing and testing these possibilities through trial-and-error, we propose using distributed systems as a principled foundation for creating and evaluating LLM teams. We find that many of the fundamental advantages and challenges studied in distributed computing also arise in LLM teams, highlighting the rich practical insights that can come from the cross-talk of these two fields of study.

Language Model Teams as Distributed Systems

Abstract

Paper Structure (27 sections, 1 equation, 6 figures, 1 table)

This paper contains 27 sections, 1 equation, 6 figures, 1 table.

Introduction
Background
The rise of LLM teams
Benefits, limitations, and risks of LLM teams
Existing design approaches
The need for a formal framework
A framework for evaluating LLM teams as distributed systems
Shared properties
Predicting LLM team performance using this framework
Results
Amdahl's Law predicts scalability in LLM teams
Tradeoffs arise with architectural choices
Coordination leads to consistency conflicts
Larger and decentralized teams incur compounding overhead
Decentralized teams mitigate straggler delays
...and 12 more sections

Figures (6)

Figure 1: LLM Teams as Distributed Systems. Distributed computing provides a principled framework for analyzing and designing LLM teams. A. Both LLM team and distributed systems research pursue similar goals: leveraging scalability to improve performance and achieving fault tolerance through mechanisms such as redundancy, replication, and consensus. B. At the same time, LLM teams inherit fundamental complexities familiar from distributed systems but absent in single-agent settings, including consistency conflicts, architectural trade-offs, communication overhead, stragglers, task scheduling, and increased compute, energy, and monetary costs. C. LLM teams share four core properties with distributed systems: independence (each agent or node operates on local context without automatic access to global state); concurrency (multiple agents or nodes execute tasks simultaneously); communication (information is exchanged through message passing); and fallibility (agents or nodes may produce errors or undergo faults).
Figure 2: Scalability. A comparison of LLM team scalability to Amdahl's Law, which predicts theoretical speedup based on the proportion of serial dependencies in a task. Teams of agents were given preassigned tasks of three types (coding a math utilities library, creating a data analysis pipeline, and SVG rendering) and three dependency structures (parallel, mixed, or serial). Each trial type was repeated five times to account for variance in API latency, and efficiency was measured using wall-clock time in seconds. Speedup represents how much faster a team completed their task compared to the one-agent baseline. Highly parallel tasks generally benefited more from scaling team size than mixed or serial tasks, as predicted by Amdahl's Law, although results depended on model type.
Figure 3: Self-coordinating (decentralized) LLM teams. In Experiment 2, agents needed to not only complete tasks but also decide on assignments themselves. A. Scalability: Speedup is substantially lower for self-coordinating than preassigned teams due to consistency conflicts and communication overhead. This difference is especially stark for highly parallel tasks. B. Consistency conflicts: In self-coordinating teams, agents exhibit conflicts like writing to the same file simultaneously (pink), rewriting a file that another agent previously wrote (yellow), and attempting to complete a function before its dependencies have been finished (brown). These problems do not arise when tasks are preassigned by a central coordinator. C. Test failures: Failed test cases per round reveal that decentralized teams exhibit higher rates of intermediate failure due to these conflicts.
Figure 4: Coordination overhead. Decentralized teams introduce greater coordination overhead, which worsens with more collaborators. A. Communication costs: Each line represents the difference in the number of messages sent when tasks are preassigned vs. decentralized. B. Idle costs: Each line represents the difference in agents remaining idle when tasks were preassigned versus decentralized. Importantly, these agents were still using tokens and sending messages; they just did not complete a task within an idle round.
Figure 5: Straggler analysis. When task assignments are fixed (preassign), performance is more susceptible to agent variability in the form of stragglers: agents that take substantially longer to complete their assigned tasks. This gap arises more often with models that exhibit greater variance in API latency, such as Claude Sonnet 4.6 and GPT-4.1 (see vertical axes), and worsens on mixed or serial tasks where workloads are naturally uneven. When task assignments are decentralized, work can be dynamically reallocated when one agent stalls. The straggler gap is quantified as the difference between the maximum and mean latency within each round, or how many extra seconds the average agent waited for the slowest teammate. Error bars represent standard deviation.
...and 1 more figures

Language Model Teams as Distributed Systems

Abstract

Language Model Teams as Distributed Systems

Authors

Abstract

Table of Contents

Figures (6)