TACTIC: Task-Agnostic Contrastive pre-Training for Inter-Agent Communication
Peihong Yu, Manav Mishra, Syed Zaidi, Pratap Tokekar
TL;DR
TACTIC addresses the challenge of generalizing inter-agent coordination across varying sight ranges in cooperative MARL by offline contrastive pretraining of a communication mechanism that aligns local observations and messages with an egocentric global state. It introduces Global Information Alignment and Feature Integration Alignment losses, plus reconstruction and dynamics objectives, and then integrates the pretrained modules into a QMIX-based online policy with frozen communication components. Through extensive SMACv2 experiments, TACTIC demonstrates superior generalization to unseen sight ranges and faster online training compared to state-of-the-art baselines, even when offline data are collected via random exploration. The approach offers a practical, task-agnostic route to robust multi-agent coordination under diverse observability conditions, with potential impact on real-world deployments where visibility fluctuates.
Abstract
The "sight range dilemma" in cooperative Multi-Agent Reinforcement Learning (MARL) presents a significant challenge: limited observability hinders team coordination, while extensive sight ranges lead to distracted attention and reduced performance. While communication can potentially address this issue, existing methods often struggle to generalize across different sight ranges, limiting their effectiveness. We propose TACTIC, Task-Agnostic Contrastive pre-Training strategy Inter-Agent Communication. TACTIC is an adaptive communication mechanism that enhances agent coordination even when the sight range during execution is vastly different from that during training. The communication mechanism encodes messages and integrates them with local observations, generating representations grounded in the global state using contrastive learning. By learning to generate and interpret messages that capture important information about the whole environment, TACTIC enables agents to effectively "see" more through communication, regardless of their sight ranges. We comprehensively evaluate TACTIC on the SMACv2 benchmark across various scenarios with broad sight ranges. The results demonstrate that TACTIC consistently outperforms traditional state-of-the-art MARL techniques with and without communication, in terms of generalizing to sight ranges different from those seen in training, particularly in cases of extremely limited or extensive observability.
