Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

Azusa Yamaguchi

Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

Azusa Yamaguchi

TL;DR

This work investigates emergent coordination in fully decentralized multi-agent reinforcement learning by revisiting Independent Q-Learning (IQL) in grid-based environments. It introduces a phase diagram based on Cooperative Success Rate and a TD-error–variance stability index to identify three regimes—coordinated/stable, fragile, and jammed/disordered—separated by a double Instability Ridge driven by kernel drift from other agents' updates. A key finding is that small symmetry-breaking differences, captured by agent identifiers, are necessary to sustain phase structure; removing IDs collapses drift and coordination dynamics, highlighting a distributional interaction mechanism. The results suggest coordination in decentralized MARL behaves as a phase-transition–like phenomenon controlled by scale, density, and drift, with implications for stability analyses and cross-disciplinary applications in economics and complex systems.

Abstract

A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the dynamics of multi-agent learning systems. We revisit fully independent Q-learning (IQL) as a minimal decentralized testbed and run large-scale experiments across environment size L and agent density rho. We construct a phase map using two axes - the cooperative success rate (CSR) and a stability index derived from TD-error variance - revealing three distinct regimes: a coordinated and stable phase, a fragile transition region, and a jammed or disordered phase. A sharp double Instability Ridge separates these regimes and corresponds to persistent kernel drift, the time-varying shift of each agent's effective transition kernel induced by others' policy updates. Synchronization analysis further shows that temporal alignment is required for sustained cooperation, and that competition between drift and synchronization generates the fragile regime. Removing agent identifiers eliminates drift entirely and collapses the three-phase structure, demonstrating that small inter-agent asymmetries are a necessary driver of drift. Overall, the results show that decentralized MARL exhibits a coherent phase structure governed by the interaction between scale, density, and kernel drift, suggesting that emergent coordination behaves as a distribution-interaction-driven phase phenomenon.

Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

TL;DR

Abstract

Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)