Table of Contents
Fetching ...

Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

Xinran Li, Xiaolu Wang, Chenjia Bai, Jun Zhang

TL;DR

This work tackles scalable communication in cooperative multi-agent reinforcement learning under partial observability by designing ExpoComm, a protocol that leverages exponential graph topologies to achieve fast information diffusion with near-linear communication costs. It integrates memory-based message processors and auxiliary grounding tasks to ensure messages reflect global information and aid decision-making, addressing the inefficiencies of pairwise connectivity in large-scale systems. Across twelve large-scale scenarios in MAgent and Infrastructure Management Planning, ExpoComm demonstrates superior performance and robust zero-shot transfer to larger agent counts, with the one-peer variant often delivering the best trade-off between performance and communication budget. The approach offers practical impact for real-world, many-agent systems by enabling scalable, globally informed coordination without prohibitive communication overhead, and its open-source code facilitates adoption and further research. The key ideas are formalized around an $\mathcal{G}^t$ topology with diameter $\lceil \log_2(N-1)\rceil$ and edge count scaling near linearly with $N$, alongside grounding losses that align local messages with global information via $\mathcal{L}^{\text{Aux}}_{\text{pred}}$ or InfoNCE, depending on state availability.

Abstract

In cooperative multi-agent reinforcement learning (MARL), well-designed communication protocols can effectively facilitate consensus among agents, thereby enhancing task performance. Moreover, in large-scale multi-agent systems commonly found in real-world applications, effective communication plays an even more critical role due to the escalated challenge of partial observability compared to smaller-scale setups. In this work, we endeavor to develop a scalable communication protocol for MARL. Unlike previous methods that focus on selecting optimal pairwise communication links-a task that becomes increasingly complex as the number of agents grows-we adopt a global perspective on communication topology design. Specifically, we propose utilizing the exponential topology to enable rapid information dissemination among agents by leveraging its small-diameter and small-size properties. This approach leads to a scalable communication protocol, named ExpoComm. To fully unlock the potential of exponential graphs as communication topologies, we employ memory-based message processors and auxiliary tasks to ground messages, ensuring that they reflect global information and benefit decision-making. Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of ExpoComm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/ExpoComm.

Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

TL;DR

This work tackles scalable communication in cooperative multi-agent reinforcement learning under partial observability by designing ExpoComm, a protocol that leverages exponential graph topologies to achieve fast information diffusion with near-linear communication costs. It integrates memory-based message processors and auxiliary grounding tasks to ensure messages reflect global information and aid decision-making, addressing the inefficiencies of pairwise connectivity in large-scale systems. Across twelve large-scale scenarios in MAgent and Infrastructure Management Planning, ExpoComm demonstrates superior performance and robust zero-shot transfer to larger agent counts, with the one-peer variant often delivering the best trade-off between performance and communication budget. The approach offers practical impact for real-world, many-agent systems by enabling scalable, globally informed coordination without prohibitive communication overhead, and its open-source code facilitates adoption and further research. The key ideas are formalized around an topology with diameter and edge count scaling near linearly with , alongside grounding losses that align local messages with global information via or InfoNCE, depending on state availability.

Abstract

In cooperative multi-agent reinforcement learning (MARL), well-designed communication protocols can effectively facilitate consensus among agents, thereby enhancing task performance. Moreover, in large-scale multi-agent systems commonly found in real-world applications, effective communication plays an even more critical role due to the escalated challenge of partial observability compared to smaller-scale setups. In this work, we endeavor to develop a scalable communication protocol for MARL. Unlike previous methods that focus on selecting optimal pairwise communication links-a task that becomes increasingly complex as the number of agents grows-we adopt a global perspective on communication topology design. Specifically, we propose utilizing the exponential topology to enable rapid information dissemination among agents by leveraging its small-diameter and small-size properties. This approach leads to a scalable communication protocol, named ExpoComm. To fully unlock the potential of exponential graphs as communication topologies, we employ memory-based message processors and auxiliary tasks to ground messages, ensuring that they reflect global information and benefit decision-making. Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of ExpoComm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/ExpoComm.

Paper Structure

This paper contains 43 sections, 1 theorem, 9 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that $E^{t}_{ij}$ is defined by eq: one_peer_exp_adj. Let $\tau = \lceil \log_2{(N-1)} \rceil$. Then, the following holds: where $\times_b$ denotes logical (Boolean) matrix multiplication.

Figures (11)

  • Figure 1: Illustration of exponential graphs with $N=8$.
  • Figure 2: A toy example to illustrate the message dissemination with different graph topologies. We demonstrate how the messages, represented by red dots, travel from a random agent to other agents over time, following different graph structures. In distance-based graphs DGN, agents are connected to top-$K$ nearest neighbors. In Erdős–Rényi random graphs ER_graph, the adjacency matrices are sampled uniformly from all the graphs satisfying the diameter and size conditions. In exponential graphs, the adjacency matrices follow \ref{['eq: static_exp_adj', 'eq: one_peer_exp_adj']}.
  • Figure 3: Neural network architecture for ExpoComm. For the static exponential topologies, attention blocks are used for message aggregation. For the one-peer exponential topologies, RNN blocks are used for message aggregation.
  • Figure 4: Performance comparison with baselines on MAgent tasks. Solid lines represent communication budgets of $K = 1$, while dashed lines represent budgets of $K = \lceil\log_2N\rceil$. Runs requiring more than 40 GB of GPU memory are excluded due to extreme training costs compared to other methods.
  • Figure 5: Zero-shot transfer results on Battle scenario. The subtitle "X to Y" indicates that methods are trained with X agents and tested with Y agents. Filled bars represent communication budgets of $K = \lceil\log_2N\rceil$, while hatched bars represent budgets of $K = 1$. Baseline CommFormer is not excluded in this experiment because it learns a fixed peer-to-peer communication topology among agents in a specific scenario and it is non-trivial to transfer such topology to scenarios with different numbers of agents.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Remark 1
  • Remark 2
  • proof