Table of Contents
Fetching ...

Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

Azusa Yamaguchi

TL;DR

This work investigates emergent coordination in fully decentralized multi-agent reinforcement learning by revisiting Independent Q-Learning (IQL) in grid-based environments. It introduces a phase diagram based on Cooperative Success Rate and a TD-error–variance stability index to identify three regimes—coordinated/stable, fragile, and jammed/disordered—separated by a double Instability Ridge driven by kernel drift from other agents' updates. A key finding is that small symmetry-breaking differences, captured by agent identifiers, are necessary to sustain phase structure; removing IDs collapses drift and coordination dynamics, highlighting a distributional interaction mechanism. The results suggest coordination in decentralized MARL behaves as a phase-transition–like phenomenon controlled by scale, density, and drift, with implications for stability analyses and cross-disciplinary applications in economics and complex systems.

Abstract

A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the dynamics of multi-agent learning systems. We revisit fully independent Q-learning (IQL) as a minimal decentralized testbed and run large-scale experiments across environment size L and agent density rho. We construct a phase map using two axes - the cooperative success rate (CSR) and a stability index derived from TD-error variance - revealing three distinct regimes: a coordinated and stable phase, a fragile transition region, and a jammed or disordered phase. A sharp double Instability Ridge separates these regimes and corresponds to persistent kernel drift, the time-varying shift of each agent's effective transition kernel induced by others' policy updates. Synchronization analysis further shows that temporal alignment is required for sustained cooperation, and that competition between drift and synchronization generates the fragile regime. Removing agent identifiers eliminates drift entirely and collapses the three-phase structure, demonstrating that small inter-agent asymmetries are a necessary driver of drift. Overall, the results show that decentralized MARL exhibits a coherent phase structure governed by the interaction between scale, density, and kernel drift, suggesting that emergent coordination behaves as a distribution-interaction-driven phase phenomenon.

Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

TL;DR

This work investigates emergent coordination in fully decentralized multi-agent reinforcement learning by revisiting Independent Q-Learning (IQL) in grid-based environments. It introduces a phase diagram based on Cooperative Success Rate and a TD-error–variance stability index to identify three regimes—coordinated/stable, fragile, and jammed/disordered—separated by a double Instability Ridge driven by kernel drift from other agents' updates. A key finding is that small symmetry-breaking differences, captured by agent identifiers, are necessary to sustain phase structure; removing IDs collapses drift and coordination dynamics, highlighting a distributional interaction mechanism. The results suggest coordination in decentralized MARL behaves as a phase-transition–like phenomenon controlled by scale, density, and drift, with implications for stability analyses and cross-disciplinary applications in economics and complex systems.

Abstract

A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the dynamics of multi-agent learning systems. We revisit fully independent Q-learning (IQL) as a minimal decentralized testbed and run large-scale experiments across environment size L and agent density rho. We construct a phase map using two axes - the cooperative success rate (CSR) and a stability index derived from TD-error variance - revealing three distinct regimes: a coordinated and stable phase, a fragile transition region, and a jammed or disordered phase. A sharp double Instability Ridge separates these regimes and corresponds to persistent kernel drift, the time-varying shift of each agent's effective transition kernel induced by others' policy updates. Synchronization analysis further shows that temporal alignment is required for sustained cooperation, and that competition between drift and synchronization generates the fragile regime. Removing agent identifiers eliminates drift entirely and collapses the three-phase structure, demonstrating that small inter-agent asymmetries are a necessary driver of drift. Overall, the results show that decentralized MARL exhibits a coherent phase structure governed by the interaction between scale, density, and kernel drift, suggesting that emergent coordination behaves as a distribution-interaction-driven phase phenomenon.

Paper Structure

This paper contains 22 sections, 12 equations, 5 figures.

Figures (5)

  • Figure 1: Cooperative success rate (CSR, left) and stability index $S$ (right) for all $(L,\rho)$ conditions, computed from the last 25% of episodes. Coordination and stability occur only at small scales and low densities, while both metrics collapse sharply as scale or density increases. The low-$S$ region marks strong non-stationarity and forms an Instability Ridge.
  • Figure 2: Phase geometry based on the normalized distance $d_{\mathrm{phase}}$. Two contour lines near $d_{\mathrm{phase}}\!\approx\!0.4$ form a double Instability Ridge. The low-$L$, low-$\rho$ region corresponds to coordinated and stable behavior, the region between the ridges to a fragile transitional regime, and larger scales to jammed/disordered outcomes.
  • Figure 3: Temporal profiles of arrival-time spread (synchronization) and co-reach rate for two representative conditions near the Instability Ridge. Pre-ridge ($L{=}16,\rho{=}0.0625$) shows rapid convergence toward coordination, whereas on-ridge ($L{=}24,\rho{=}0.03125$) exhibits alternating collapse--recovery cycles.
  • Figure 4: TD-error variance (top) and gradient-norm variance (bottom) for $L=24$ across densities. The dominant source of instability differs by density: persistent variance growth near the Ridge, oscillatory behavior on the Ridge, early saturation outside the Ridge, and late-stage divergence at high densities.These four densities were selected because they represent pre-ridge, on-ridge, post-ridge, and high-density regimes, respectively.
  • Figure 5: (Left) Relationship between arrival-time spread and co-reach across all conditions. High co-reach occurs only when spread is small; large fluctuations appear near the Instability Ridge. (Right) Effective coordinated throughput $\rho_{\mathrm{eff}}=\rho_{\mathrm{agents}}\mathrm{CSR}$, showing scale-dependent critical densities where throughput peaks and then declines.