Table of Contents
Fetching ...

Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

Sydney Dolan, Siddharth Nayak, Jasmine Jerry Aloor, Hamsa Balakrishnan

TL;DR

AsynCoMARL tackles coordination in partially observable multi-agent settings with limited and asynchronous communications by learning communication protocols through a graph-transformer operating on a dynamic agent-entity graph. Each active agent maintains a local graph embedding via a two-layer UniMP graph transformer, while a centralized critic receives global graph representations to guide policy updates under a MAPPO/PPO framework; edges form only when proximity and synchronized actions occur, and inactive agents are masked. The approach achieves competitive success and collision rates while reducing inter-agent messages by up to ~26% in Cooperative Navigation and performing comparably to baselines in Rover-Tower, demonstrating robust performance under asynchronous communication constraints. This work advances practical MARL for space missions and planetary rovers by balancing communication efficiency with coordination quality, and highlights how dynamic graph attention balances proximity and communication frequency in evolving teams.

Abstract

We consider the problem setting in which multiple autonomous agents must cooperatively navigate and perform tasks in an unknown, communication-constrained environment. Traditional multi-agent reinforcement learning (MARL) approaches assume synchronous communications and perform poorly in such environments. We propose AsynCoMARL, an asynchronous MARL approach that uses graph transformers to learn communication protocols from dynamic graphs. AsynCoMARL can accommodate infrequent and asynchronous communications between agents, with edges of the graph only forming when agents communicate with each other. We show that AsynCoMARL achieves similar success and collision rates as leading baselines, despite 26\% fewer messages being passed between agents.

Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

TL;DR

AsynCoMARL tackles coordination in partially observable multi-agent settings with limited and asynchronous communications by learning communication protocols through a graph-transformer operating on a dynamic agent-entity graph. Each active agent maintains a local graph embedding via a two-layer UniMP graph transformer, while a centralized critic receives global graph representations to guide policy updates under a MAPPO/PPO framework; edges form only when proximity and synchronized actions occur, and inactive agents are masked. The approach achieves competitive success and collision rates while reducing inter-agent messages by up to ~26% in Cooperative Navigation and performing comparably to baselines in Rover-Tower, demonstrating robust performance under asynchronous communication constraints. This work advances practical MARL for space missions and planetary rovers by balancing communication efficiency with coordination quality, and highlights how dynamic graph attention balances proximity and communication frequency in evolving teams.

Abstract

We consider the problem setting in which multiple autonomous agents must cooperatively navigate and perform tasks in an unknown, communication-constrained environment. Traditional multi-agent reinforcement learning (MARL) approaches assume synchronous communications and perform poorly in such environments. We propose AsynCoMARL, an asynchronous MARL approach that uses graph transformers to learn communication protocols from dynamic graphs. AsynCoMARL can accommodate infrequent and asynchronous communications between agents, with edges of the graph only forming when agents communicate with each other. We show that AsynCoMARL achieves similar success and collision rates as leading baselines, despite 26\% fewer messages being passed between agents.

Paper Structure

This paper contains 35 sections, 7 equations, 2 figures, 9 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of AsynCoMARL: (a) Environment. Agents within our environment take actions and observations asynchronously. To encourage collaboration, when agents take actions at the same time $t$, they receive a shared reward. The sequence of actions and observations for agent $i$ is referred to by timescale $\tau^{(i)}$. The arrows indicate data transmissions, which represent the most recent graph observation $x_{\tau_{agg}}^{(i)}.$ b) Asynchronous Temporal Graph Representation. Each active agent within our environment is translated to become a node on the graph, and they can communicate with other agents located nearby within distance $\phi$. Our graph representation is dynamic, meaning that graph edges connect and disconnect depending on agent proximity. c) Agent $i$’s observation is combined with its node observations from the graph transformer, $x_{\tau,agg}^{(i)}$, and fed into the actor network. The critic takes the full graph representation $X_{agg}$ and evaluates agent $i$’s action.
  • Figure 2: Attention weights for agent 0 in the $n=5$ agent Cooperative Navigation task. We compare the changes in graph transformer attention at three discrete periods during the episode at the beginning, middle, and end.