Table of Contents
Fetching ...

Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

Victoria Dochkina

Abstract

How much autonomy can multi-agent LLM systems sustain -- and what enables it? We present a 25,000-task computational experiment spanning 8 models, 4--256 agents, and 8 coordination protocols ranging from externally imposed hierarchy to emergent self-organization. We observe that autonomous behavior already emerges in current LLM agents: given minimal structural scaffolding (fixed ordering), agents spontaneously invent specialized roles, voluntarily abstain from tasks outside their competence, and form shallow hierarchies -- without any pre-assigned roles or external design. A hybrid protocol (Sequential) that enables this autonomy outperforms centralized coordination by 14% (p<0.001), with a 44% quality spread between protocols (Cohen's d=1.86, p<0.0001). The degree of emergent autonomy scales with model capability: strong models self-organize effectively, while models below a capability threshold still benefit from rigid structure -- suggesting that as foundation models improve, the scope for autonomous coordination will expand. The system scales sub-linearly to 256 agents without quality degradation (p=0.61), producing 5,006 unique roles from just 8 agents. Results replicate across closed- and open-source models, with open-source achieving 95% of closed-source quality at 24x lower cost. The practical implication: give agents a mission, a protocol, and a capable model -- not a pre-assigned role.

Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

Abstract

How much autonomy can multi-agent LLM systems sustain -- and what enables it? We present a 25,000-task computational experiment spanning 8 models, 4--256 agents, and 8 coordination protocols ranging from externally imposed hierarchy to emergent self-organization. We observe that autonomous behavior already emerges in current LLM agents: given minimal structural scaffolding (fixed ordering), agents spontaneously invent specialized roles, voluntarily abstain from tasks outside their competence, and form shallow hierarchies -- without any pre-assigned roles or external design. A hybrid protocol (Sequential) that enables this autonomy outperforms centralized coordination by 14% (p<0.001), with a 44% quality spread between protocols (Cohen's d=1.86, p<0.0001). The degree of emergent autonomy scales with model capability: strong models self-organize effectively, while models below a capability threshold still benefit from rigid structure -- suggesting that as foundation models improve, the scope for autonomous coordination will expand. The system scales sub-linearly to 256 agents without quality degradation (p=0.61), producing 5,006 unique roles from just 8 agents. Results replicate across closed- and open-source models, with open-source achieving 95% of closed-source quality at 24x lower cost. The practical implication: give agents a mission, a protocol, and a capable model -- not a pre-assigned role.

Paper Structure

This paper contains 33 sections, 4 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Quality comparison across coordination protocols. Solid bars: pilot ($N=8$, GPT-4.1-mini, L3+L4 average). Hatched bars: final ($N=16$, Claude Sonnet 4.6, L3). The hybrid Sequential protocol achieves the highest quality in both settings.
  • Figure 2: Scaling behavior in Series 2 (fixed roles, GPT-4.1-mini, $N=8 \to 64$). Quality remains stable ($Q \in [0.949, 0.955]$) while cost grows only $11.8\%$ (coefficient of variation CV $= 4.4\%$) despite an $8\times$ increase in agents.
  • Figure 3: Quality degradation across task complexity levels L1--L4. Hierarchy depth increases with task complexity, indicating emergent structural adaptation.
  • Figure 4: Role assignment heatmap (Sequential protocol, $N=16$, Claude Sonnet 4.6, 10 L3 tasks). Each cell color represents a unique role chosen by the agent for that task; $\times$ marks voluntary abstention. The mosaic pattern (115 unique roles in 10 tasks) demonstrates RSI $\to 0$: agents reinvent their specialization for each task rather than settling into fixed positions.
  • Figure 5: Self-abstention as an emergent property. Left: agent participation rate vs. quality across tasks (Sequential = circles, Coordinator = squares). Right: mechanism of non-participation---38% voluntary (Sequential, endogenous) vs. 100% coordinator-directed (Coordinator, exogenous). Voluntary abstention correlates with higher quality ($Q=0.875$ vs. $0.767$).
  • ...and 1 more figures