Table of Contents
Fetching ...

Stochastic Self-Organization in Multi-Agent Systems

Nurbek Tastan, Samuel Horvath, Karthik Nandakumar

TL;DR

SelfOrg tackles the orchestration challenge in LLM-based multi-agent systems by forming a per-instance directed acyclic graph (DAG) from agent responses and using a Shapley-inspired, embedding-based contribution estimate to route information from high- to low-contributing agents. It avoids pretrained topology generators and external judges, enabling a lightweight, self-organizing collaboration that adapts to stochastic agent outputs. The paper provides theoretical bounds on the contribution approximation and demonstrates that multiple agents amplify correct signals, especially in weak-backend regimes, with strong empirical gains across a range of benchmarks and backbones. Overall, SelfOrg offers a practical, scalable approach to MAS coordination that improves robustness and performance without additional supervision or training.

Abstract

Multi-agent systems (MAS) based on Large Language Models (LLMs) have the potential to solve tasks that are beyond the reach of any single LLM. However, this potential can only be realized when the collaboration mechanism between agents is optimized. Specifically, optimizing the communication structure between agents is critical for fruitful collaboration. Most existing approaches rely on fixed topologies, pretrained graph generators, optimization over edges, or employ external LLM judges, thereby adding to the complexity. In this work, we introduce a response-conditioned framework that adapts communication on-the-fly. Agents independently generate responses to the user query and assess peer contributions using an approximation of the Shapley value. A directed acyclic graph (DAG) is then constructed to regulate the propagation of the responses among agents, which ensures stable and efficient message transmission from high-contributing agents to others. This graph is dynamically updated based on the agent responses from the previous collaboration round. Since the proposed framework enables the self-organization of agents without additional supervision or training, we refer to it as SelfOrg. The SelfOrg framework goes beyond task- and query-level optimization and takes into account the stochastic nature of agent responses. Experiments with both strong and weak LLM backends demonstrate robust performance, with significant gains in the weak regime where prior methods collapse. We also theoretically show that multiple agents increase the chance of correctness and that the correct responses naturally dominate the information flow.

Stochastic Self-Organization in Multi-Agent Systems

TL;DR

SelfOrg tackles the orchestration challenge in LLM-based multi-agent systems by forming a per-instance directed acyclic graph (DAG) from agent responses and using a Shapley-inspired, embedding-based contribution estimate to route information from high- to low-contributing agents. It avoids pretrained topology generators and external judges, enabling a lightweight, self-organizing collaboration that adapts to stochastic agent outputs. The paper provides theoretical bounds on the contribution approximation and demonstrates that multiple agents amplify correct signals, especially in weak-backend regimes, with strong empirical gains across a range of benchmarks and backbones. Overall, SelfOrg offers a practical, scalable approach to MAS coordination that improves robustness and performance without additional supervision or training.

Abstract

Multi-agent systems (MAS) based on Large Language Models (LLMs) have the potential to solve tasks that are beyond the reach of any single LLM. However, this potential can only be realized when the collaboration mechanism between agents is optimized. Specifically, optimizing the communication structure between agents is critical for fruitful collaboration. Most existing approaches rely on fixed topologies, pretrained graph generators, optimization over edges, or employ external LLM judges, thereby adding to the complexity. In this work, we introduce a response-conditioned framework that adapts communication on-the-fly. Agents independently generate responses to the user query and assess peer contributions using an approximation of the Shapley value. A directed acyclic graph (DAG) is then constructed to regulate the propagation of the responses among agents, which ensures stable and efficient message transmission from high-contributing agents to others. This graph is dynamically updated based on the agent responses from the previous collaboration round. Since the proposed framework enables the self-organization of agents without additional supervision or training, we refer to it as SelfOrg. The SelfOrg framework goes beyond task- and query-level optimization and takes into account the stochastic nature of agent responses. Experiments with both strong and weak LLM backends demonstrate robust performance, with significant gains in the weak regime where prior methods collapse. We also theoretically show that multiple agents increase the chance of correctness and that the correct responses naturally dominate the information flow.

Paper Structure

This paper contains 38 sections, 5 theorems, 23 equations, 13 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Suppose $\|\mathbf{r}_n\| = \Gamma$ for all $n \in [N]$ and $|\langle \mathbf{r}_n, \mathbf{r}_{\textrm{avg}} \rangle| \geq 1/I$ for some $I>0$. Then where $L_n$ is a multiplicative factor that can be normalized away xu2021gradient.

Figures (13)

  • Figure 1: Overview of SelfOrg. A query $\mathcal{Q}$ is distributed to $N$ agents, each producing a response $\mathcal{R}_n$. Responses are embedded, contributions estimated via Shapley-based valuation, and a directed acyclic communication graph is formed where edges reflect contributions and high-contribution agents lead. The figure depicts a single round; the process is iterated for $T$ rounds.
  • Figure 2: Analysis of Qwen-1.5B agent over 100 runs on the same math problem (GSM-Hard).
  • Figure 3: Scaling laws of Qwen-2.5-X-Instruct models across two reasoning benchmarks (AQUA-RAT and MMLU-Pro). The table shows exact accuracy values for different model sizes under the Single and SelfOrg settings, while plot visualizes performance trends.
  • Figure 4: Heterogeneous Agents. Left: accuracies on AQUA-RAT and MMLU-Pro for each backbone and for the mixed-pool baseline (Single) vs. SelfOr. Right: percentage of times each agent attains contribution rank $r$ (rank-1 highest).
  • Figure 5: Heatmaps of ranking outcomes with a weak agent in the pool. Each heatmap depicts the percentage $(\%)$ of times agents were assigned to contribution ranks (rank $1$ = highest contribution, rank $4$ = weakest). The y-axis denotes the model type (Qwen-2.5-{7,1.5}B-Instruct) assigned to each agent.
  • ...and 8 more figures

Theorems & Definitions (9)

  • Theorem 1: Approximation Bound xu2021gradient
  • Corollary 1: Ranking Stability
  • Lemma 1: Agreement Concentration
  • Lemma 2: Contribution Dominance
  • Corollary 2: Correctness Amplification
  • proof
  • proof
  • proof
  • proof