Deep Reinforcement Learning for Multi-Agent Coordination
Kehinde O. Aina, Sehoon Ha
TL;DR
The paper tackles the problem of coordinating multiple robots in narrow, congested settings where explicit inter-agent communication is impractical. It introduces Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL), which uses a digital pheromone stigmergy layer and curriculum learning to achieve decentralized coordination and scalability up to eight homogeneous agents. Through OpenAI Gym simulations of a collective pellet retrieval task, S-MADRL demonstrates emergent strategies such as asymmetric workload distribution and selective idleness that mitigate congestion and non-stationarity, outperforming state-of-the-art MADRL baselines. The framework offers a biologically inspired, communication-free approach with potential real-world applicability in mining, search-and-rescue, and other crowded environments, while recognizing limitations related to sensing noise and agent homogeneity. Future directions include extending to heterogeneous roles and improving transfer to physical robots.
Abstract
We address the challenge of coordinating multiple robots in narrow and confined environments, where congestion and interference often hinder collective task performance. Drawing inspiration from insect colonies, which achieve robust coordination through stigmergy -- modifying and interpreting environmental traces -- we propose a Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL) framework that leverages virtual pheromones to model local and social interactions, enabling decentralized emergent coordination without explicit communication. To overcome the convergence and scalability limitations of existing algorithms such as MADQN, MADDPG, and MAPPO, we leverage curriculum learning, which decomposes complex tasks into progressively harder sub-problems. Simulation results show that our framework achieves the most effective coordination of up to eight agents, where robots self-organize into asymmetric workload distributions that reduce congestion and modulate group performance. This emergent behavior, analogous to strategies observed in nature, demonstrates a scalable solution for decentralized multi-agent coordination in crowded environments with communication constraints.
