Learning Multi-Agent Communication with Contrastive Learning
Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
TL;DR
This work addresses learning effective communication in decentralized multi-agent reinforcement learning by reframing messages as incomplete views of the environment state. It introduces Communication Alignment Contrastive Learning (CACL), a SupCon-based objective that aligns sent and received messages across agents within trajectory windows, encouraging symmetric, mutually intelligible protocols and encoding of global state information. Empirical results across three communication-critical tasks show that CACL improves performance and learning speed over strong baselines, with ablations validating the importance of the temporal window and contrastive formulation. The study also demonstrates that CACL yields highly symmetric protocols and semantically meaningful representations, signaling the potential of contrastive self-supervision for emergent communication and zero-shot coordination in MARL.
Abstract
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.
