Table of Contents
Fetching ...

Context-aware Communication for Multi-agent Reinforcement Learning

Xinran Li, Jun Zhang

TL;DR

This paper tackles bandwidth-constrained cooperative MARL by introducing CACOM, a two-stage, context-aware communication protocol that first exchanges coarse context and then delivers personalized messages via attention. Messages are discretized with Learned Step Size Quantization (LSQ) to maintain differentiability while reducing overhead, and a gating mechanism prunes unnecessary links to further save bandwidth. Implemented atop QMIX and MADDPG, CACOM demonstrates consistent performance gains over baselines on MPE and SMAC under tight budgets, highlighting the value of receiver-centric, context-driven messaging. The work advances practical MARL by aligning communication design with information needs of receivers and showing how attention-based personalization and link pruning can yield robust, scalable coordination in real-world, bandwidth-limited systems.

Abstract

Effective communication protocols in multi-agent reinforcement learning (MARL) are critical to fostering cooperation and enhancing team performance. To leverage communication, many previous works have proposed to compress local information into a single message and broadcast it to all reachable agents. This simplistic messaging mechanism, however, may fail to provide adequate, critical, and relevant information to individual agents, especially in severely bandwidth-limited scenarios. This motivates us to develop context-aware communication schemes for MARL, aiming to deliver personalized messages to different agents. Our communication protocol, named CACOM, consists of two stages. In the first stage, agents exchange coarse representations in a broadcast fashion, providing context for the second stage. Following this, agents utilize attention mechanisms in the second stage to selectively generate messages personalized for the receivers. Furthermore, we employ the learned step size quantization (LSQ) technique for message quantization to reduce the communication overhead. To evaluate the effectiveness of CACOM, we integrate it with both actor-critic and value-based MARL algorithms. Empirical results on cooperative benchmark tasks demonstrate that CACOM provides evident performance gains over baselines under communication-constrained scenarios. The code is publicly available at https://github.com/LXXXXR/CACOM.

Context-aware Communication for Multi-agent Reinforcement Learning

TL;DR

This paper tackles bandwidth-constrained cooperative MARL by introducing CACOM, a two-stage, context-aware communication protocol that first exchanges coarse context and then delivers personalized messages via attention. Messages are discretized with Learned Step Size Quantization (LSQ) to maintain differentiability while reducing overhead, and a gating mechanism prunes unnecessary links to further save bandwidth. Implemented atop QMIX and MADDPG, CACOM demonstrates consistent performance gains over baselines on MPE and SMAC under tight budgets, highlighting the value of receiver-centric, context-driven messaging. The work advances practical MARL by aligning communication design with information needs of receivers and showing how attention-based personalization and link pruning can yield robust, scalable coordination in real-world, bandwidth-limited systems.

Abstract

Effective communication protocols in multi-agent reinforcement learning (MARL) are critical to fostering cooperation and enhancing team performance. To leverage communication, many previous works have proposed to compress local information into a single message and broadcast it to all reachable agents. This simplistic messaging mechanism, however, may fail to provide adequate, critical, and relevant information to individual agents, especially in severely bandwidth-limited scenarios. This motivates us to develop context-aware communication schemes for MARL, aiming to deliver personalized messages to different agents. Our communication protocol, named CACOM, consists of two stages. In the first stage, agents exchange coarse representations in a broadcast fashion, providing context for the second stage. Following this, agents utilize attention mechanisms in the second stage to selectively generate messages personalized for the receivers. Furthermore, we employ the learned step size quantization (LSQ) technique for message quantization to reduce the communication overhead. To evaluate the effectiveness of CACOM, we integrate it with both actor-critic and value-based MARL algorithms. Empirical results on cooperative benchmark tasks demonstrate that CACOM provides evident performance gains over baselines under communication-constrained scenarios. The code is publicly available at https://github.com/LXXXXR/CACOM.
Paper Structure (15 sections, 12 equations, 9 figures, 7 tables)

This paper contains 15 sections, 12 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: An illustrative example in a busy traffic junction. Under the broadcasting communication scheme, without knowledge of agent A and C's intentions, agent B needs to broadcast all its observations, which will result in heavy communication overhead. In contrast, when adopting context-aware communication, agent A and C first convey their local information in short context messages. Then agent B generates personalized messages for A and C based on the context messages from the previous stage. In this way, more context-relevant messages are provided for decision making with much lower communication overhead.
  • Figure 2: Illustration of the CACOM protocol from the helpee agent $i$'s perspective. The blue arrows denote local processing and the red arrows denote communication. At each timestep, agent $i$ first broadcasts a context message $c_i$ to all its peers agent $k$ and $j$. Then, after local processing, agents $k$ and $j$ decide whether to reply and what to send, respectively.
  • Figure 3: Network architecture for CACOM. (a) Helpee's feature encoder and policy network. (b) Overall architecture for a helpee agent $i$. (c) Overall architecture for a helper agent $j$. (d) Helper's message generator.
  • Figure 4: Multi-agent environments.
  • Figure 5: Performance comparison with baselines on MPE benchmarks.
  • ...and 4 more figures