Table of Contents
Fetching ...

Emergent Coordination in Multi-Agent Language Models

Christoph Riedl

TL;DR

This work asks when a population of multi-agent LLMs forms a genuine higher-order collective rather than a loose collection of individuals. It introduces a data-driven, information-theoretic framework based on time-delayed mutual information and partial information decomposition to detect emergent synergy, localize its origins, and assess its functional relevance. Through a group guessing task with three prompting interventions (Plain, Persona, ToM), the study shows evidence of emergent dynamics, identity-linked differentiation, and goal-directed complementarity, with ToM prompts yielding the strongest integration and performance benefits. The findings provide principled design guidance for steering multi-agent collectives and demonstrate that coordinated, higher-order structure—not merely aggregate performance—drives improvements in multi-agent LLM systems.

Abstract

When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement both a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and only minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but only little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.

Emergent Coordination in Multi-Agent Language Models

TL;DR

This work asks when a population of multi-agent LLMs forms a genuine higher-order collective rather than a loose collection of individuals. It introduces a data-driven, information-theoretic framework based on time-delayed mutual information and partial information decomposition to detect emergent synergy, localize its origins, and assess its functional relevance. Through a group guessing task with three prompting interventions (Plain, Persona, ToM), the study shows evidence of emergent dynamics, identity-linked differentiation, and goal-directed complementarity, with ToM prompts yielding the strongest integration and performance benefits. The findings provide principled design guidance for steering multi-agent collectives and demonstrate that coordinated, higher-order structure—not merely aggregate performance—drives improvements in multi-agent LLM systems.

Abstract

When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement both a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and only minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but only little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.

Paper Structure

This paper contains 35 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: a) Information decomposition provides framework to explain tension in multi-agent systems. Agents are either undifferentiated or differentiated, provide independent or complementary information, which is either well aligned or incidental luppi2024information. b) Experiment setup of the group binary search task. c) Preliminary experiments testing different group sizes and temperature settings. Surface values were smoothed using a local $3 \times 3$ weighted averaging filter, giving higher weight to each cell’s original value to reduce noise while preserving local structure.
  • Figure 2: a) Group success across three interventions. b) Practical emergence criterion (bias corrected). c) Emergence capacity dynamical synergy (bias corrected). Data Winsorized at the 1st and 99th percentiles for visual clarity. Stars indicate significance level of Wilcoxon test. Notes: *** $p < 0.001$; ** $p < 0.01$; * $p < 0.05$.
  • Figure 3: a) Agent differentiation using hierarchical mixed model comparison (counting groups in which at least one test (different intercepts or slopes) is below $p < 0.05$. b) Total time-delayed mutual information of triplets ($I_3$). c) Dynamic emergent synergy ($G_3$). d) ToM-prompt condition has substantially more groups with significant $I_3$ content (above 0). In panel a) and d), error bars show Wilson confidence intervals for binary data. Stars indicate significance level of test for equal proportion. Panel b) and c) shows raw data with Jeffreys' correction. Data are Winsorized at the 1st and 99th percentiles for visual clarity. Stars indicate significance level of Wilcoxon test. Notes: *** $p < 0.001$; ** $p < 0.01$; * $p < 0.05$.
  • Figure A1: Conditional on success, we plot which round it was achieved.