Table of Contents
Fetching ...

Deep Reinforcement Learning for Multi-Agent Coordination

Kehinde O. Aina, Sehoon Ha

TL;DR

The paper tackles the problem of coordinating multiple robots in narrow, congested settings where explicit inter-agent communication is impractical. It introduces Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL), which uses a digital pheromone stigmergy layer and curriculum learning to achieve decentralized coordination and scalability up to eight homogeneous agents. Through OpenAI Gym simulations of a collective pellet retrieval task, S-MADRL demonstrates emergent strategies such as asymmetric workload distribution and selective idleness that mitigate congestion and non-stationarity, outperforming state-of-the-art MADRL baselines. The framework offers a biologically inspired, communication-free approach with potential real-world applicability in mining, search-and-rescue, and other crowded environments, while recognizing limitations related to sensing noise and agent homogeneity. Future directions include extending to heterogeneous roles and improving transfer to physical robots.

Abstract

We address the challenge of coordinating multiple robots in narrow and confined environments, where congestion and interference often hinder collective task performance. Drawing inspiration from insect colonies, which achieve robust coordination through stigmergy -- modifying and interpreting environmental traces -- we propose a Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL) framework that leverages virtual pheromones to model local and social interactions, enabling decentralized emergent coordination without explicit communication. To overcome the convergence and scalability limitations of existing algorithms such as MADQN, MADDPG, and MAPPO, we leverage curriculum learning, which decomposes complex tasks into progressively harder sub-problems. Simulation results show that our framework achieves the most effective coordination of up to eight agents, where robots self-organize into asymmetric workload distributions that reduce congestion and modulate group performance. This emergent behavior, analogous to strategies observed in nature, demonstrates a scalable solution for decentralized multi-agent coordination in crowded environments with communication constraints.

Deep Reinforcement Learning for Multi-Agent Coordination

TL;DR

The paper tackles the problem of coordinating multiple robots in narrow, congested settings where explicit inter-agent communication is impractical. It introduces Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL), which uses a digital pheromone stigmergy layer and curriculum learning to achieve decentralized coordination and scalability up to eight homogeneous agents. Through OpenAI Gym simulations of a collective pellet retrieval task, S-MADRL demonstrates emergent strategies such as asymmetric workload distribution and selective idleness that mitigate congestion and non-stationarity, outperforming state-of-the-art MADRL baselines. The framework offers a biologically inspired, communication-free approach with potential real-world applicability in mining, search-and-rescue, and other crowded environments, while recognizing limitations related to sensing noise and agent homogeneity. Future directions include extending to heterogeneous roles and improving transfer to physical robots.

Abstract

We address the challenge of coordinating multiple robots in narrow and confined environments, where congestion and interference often hinder collective task performance. Drawing inspiration from insect colonies, which achieve robust coordination through stigmergy -- modifying and interpreting environmental traces -- we propose a Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL) framework that leverages virtual pheromones to model local and social interactions, enabling decentralized emergent coordination without explicit communication. To overcome the convergence and scalability limitations of existing algorithms such as MADQN, MADDPG, and MAPPO, we leverage curriculum learning, which decomposes complex tasks into progressively harder sub-problems. Simulation results show that our framework achieves the most effective coordination of up to eight agents, where robots self-organize into asymmetric workload distributions that reduce congestion and modulate group performance. This emergent behavior, analogous to strategies observed in nature, demonstrates a scalable solution for decentralized multi-agent coordination in crowded environments with communication constraints.

Paper Structure

This paper contains 13 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Experimental setup of the multi-robot excavation task. (A) Real-world top view of the excavation arena, consisting of a pellet source, a narrow tunnel, excavating robots, and a home area. (B) Corresponding abstracted simulation model used for training and evaluation. The simplified representation preserves essential components in the real-world model, and enables scalable and varied experiment scenarios.
  • Figure 2: Proposed scalable decentralized MADRL framework with stigmergic communication. Each agent $i$ receives a local observation $O_i$ and selects an action $a_i$ based on its policy $\pi_i$. The environment provides the resulting state $S_i$ and reward $r_i$, while agents leave and sense pheromone traces $\rho_i$ that diffuse and decay over time. This indirect communication channel encodes recent occupancy and agent activity, enabling decentralized coordination without explicit message passing. Learning and execution are fully independent for each agent, ensuring scalability to large team sizes.
  • Figure 3: Schematic of the digital pheromone map and agents’ restricted field of view. Agents deposit virtual pheromones while moving, generating spatial gradients (shown in different shades of gray) that diffuse and decay over time. Each agent perceives only a limited local region (red arrows), mimicking partial observability. The blue agent is laden with pellet and returning to the home area, while green agents are searching for pellets. Stigmergic communication provides environmental memory that supports implicit coordination among agents.
  • Figure 4: Cumulative excavation results comparing four multi-agent deep reinforcement learning (MADRL) methods (IQL baseline, IQL+G, IQL+GS, and IQL+GSC) for teams of up to five agents. While the baseline (IQL) performs adequately for one and two agents, performance declines as team size increases. Incorporating stigmergy (IQL+GS) improves coordination for three and four agents, but struggles at five. Combining stigmergy with curriculum learning (IQL+GSC) achieves the highest excavation performance across all team sizes, demonstrating superior scalability and robustness in moderately congested environments.
  • Figure 5: Learning curve comparison of the four MADRL techniques (IQL, IQL+G, IQL+GS, and IQL+GSC) for one to five agents. Baseline methods (IQL, IQL+G) fail to converge beyond three agents. Stigmergic communication (IQL+GS) enhances learning stability, while curriculum learning (IQL+GSC) further accelerates convergence and achieves the highest rewards in four- and five-agent scenarios.
  • ...and 3 more figures