Table of Contents
Fetching ...

Learning Multi-Agent Communication with Contrastive Learning

Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

TL;DR

This work addresses learning effective communication in decentralized multi-agent reinforcement learning by reframing messages as incomplete views of the environment state. It introduces Communication Alignment Contrastive Learning (CACL), a SupCon-based objective that aligns sent and received messages across agents within trajectory windows, encouraging symmetric, mutually intelligible protocols and encoding of global state information. Empirical results across three communication-critical tasks show that CACL improves performance and learning speed over strong baselines, with ablations validating the importance of the temporal window and contrastive formulation. The study also demonstrates that CACL yields highly symmetric protocols and semantically meaningful representations, signaling the potential of contrastive self-supervision for emergent communication and zero-shot coordination in MARL.

Abstract

Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.

Learning Multi-Agent Communication with Contrastive Learning

TL;DR

This work addresses learning effective communication in decentralized multi-agent reinforcement learning by reframing messages as incomplete views of the environment state. It introduces Communication Alignment Contrastive Learning (CACL), a SupCon-based objective that aligns sent and received messages across agents within trajectory windows, encouraging symmetric, mutually intelligible protocols and encoding of global state information. Empirical results across three communication-critical tasks show that CACL improves performance and learning speed over strong baselines, with ablations validating the importance of the temporal window and contrastive formulation. The study also demonstrates that CACL yields highly symmetric protocols and semantically meaningful representations, signaling the potential of contrastive self-supervision for emergent communication and zero-shot coordination in MARL.

Abstract

Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.
Paper Structure (27 sections, 6 equations, 10 figures, 7 tables)

This paper contains 27 sections, 6 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: In multi-view learning, augmentations of the original image or "views" are used as positive samples for contrastive learning. In our proposed method, CACL, different agents' views of the same environment state are considered positive samples and messages are contrastively learned as encodings of that state.
  • Figure 2: CACL (red) outperforms all other methods on Traffic-Junction (left), Predator-Prey (left) and Find-Goal (right). Predator-Prey shows evaluation reward, higher is better. Traffic-Junction plots the percent of successful episodes, higher is better. Find-Goal plots the episode length until the goal is reached, lower is better. The performance curves are smoothed by a factor of 0.5 with standard errors plotted as shaded areas.
  • Figure 3: Success rate in Predator-Prey: the percentage of final evaluation runs that captured no prey, one prey, or both prey. Average over 6 random seeds, each with 10 evaluation episodes. See Appendix \ref{['app::tab::predator_prey_success']} for the same results with standard deviation.
  • Figure 4: Predator-Prey ablation experiment on $L_{CACL}$ varying the sliding window size and $\kappa$.
  • Figure 5: Comparing CACL and AEComm with their respective variants when combined with DIAL. Variants with DIAL have generally worse performance.
  • ...and 5 more figures