TACTIC: Task-Agnostic Contrastive pre-Training for Inter-Agent Communication

Peihong Yu; Manav Mishra; Syed Zaidi; Pratap Tokekar

TACTIC: Task-Agnostic Contrastive pre-Training for Inter-Agent Communication

Peihong Yu, Manav Mishra, Syed Zaidi, Pratap Tokekar

TL;DR

TACTIC addresses the challenge of generalizing inter-agent coordination across varying sight ranges in cooperative MARL by offline contrastive pretraining of a communication mechanism that aligns local observations and messages with an egocentric global state. It introduces Global Information Alignment and Feature Integration Alignment losses, plus reconstruction and dynamics objectives, and then integrates the pretrained modules into a QMIX-based online policy with frozen communication components. Through extensive SMACv2 experiments, TACTIC demonstrates superior generalization to unseen sight ranges and faster online training compared to state-of-the-art baselines, even when offline data are collected via random exploration. The approach offers a practical, task-agnostic route to robust multi-agent coordination under diverse observability conditions, with potential impact on real-world deployments where visibility fluctuates.

Abstract

The "sight range dilemma" in cooperative Multi-Agent Reinforcement Learning (MARL) presents a significant challenge: limited observability hinders team coordination, while extensive sight ranges lead to distracted attention and reduced performance. While communication can potentially address this issue, existing methods often struggle to generalize across different sight ranges, limiting their effectiveness. We propose TACTIC, Task-Agnostic Contrastive pre-Training strategy Inter-Agent Communication. TACTIC is an adaptive communication mechanism that enhances agent coordination even when the sight range during execution is vastly different from that during training. The communication mechanism encodes messages and integrates them with local observations, generating representations grounded in the global state using contrastive learning. By learning to generate and interpret messages that capture important information about the whole environment, TACTIC enables agents to effectively "see" more through communication, regardless of their sight ranges. We comprehensively evaluate TACTIC on the SMACv2 benchmark across various scenarios with broad sight ranges. The results demonstrate that TACTIC consistently outperforms traditional state-of-the-art MARL techniques with and without communication, in terms of generalizing to sight ranges different from those seen in training, particularly in cases of extremely limited or extensive observability.

TACTIC: Task-Agnostic Contrastive pre-Training for Inter-Agent Communication

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 9 figures)

This paper contains 12 sections, 3 equations, 9 figures.

Introduction
Related Work
Preliminaries
TACTIC: Towards task-agnostic adaptive communication in MARL
Offline Training of Communication Mechanism
Online Training of Agent Policy
Experiments
Experimental Setup
Policy Generalization Across Sight Ranges
Online Policy Training Efficiency
Ablation Study
Conclusion

Figures (9)

Figure 1: TACTIC utilizes contrastive learning to align the integration of local observations $o_i$ and messages $\{m_{ji}\}$ with the full egocentric state $\hat{s}_i$ for each agent $i$, enabling agents to "see" beyond their limited sight ranges through communication.
Figure 2: QMIX-ATT and TACTIC's performances on Protoss 10v10 from SMACv2 ellis2024smacv2 with varying sight ranges (SRs). Different SRs are achieved by applying different sight-range ratios (SRRs) to the agents' original SRs in the implementation. Policies trained at SRR=0.2, 1, and 5 are tested across a broader set of SRRs. QMIX-ATT Struggles to generalize to unseen SRs, while TACTIC generalizes much better.
Figure 3: The offline training pipeline of the adaptive communication mechanism(Section \ref{['sec:offlineTrain']}). It includes three key components: an egocentric state encoder, an adaptive message generator, and a message-observation integrator. The training pipeline consists of two contrastive learning processes: Global Information Alignment (GIA) for aligning the features generated from the egocentric state encoder across all agents and timesteps, and Feature Integration Alignment (FIA) for aligning features from the message-observation integrator and the egocentric state encoder on an individual agent. Two auxiliary loss functions are introduced in the total loss function to enhance training: a deconstruction loss for learning to recover the egocentric states and a dynamic loss for learning temporally coherent representations.
Figure 4: Online policy training pipeline of TACTIC, illustrating the integration of QMIX architecture with pre-trained communication components. The pre-trained message generator (Mess Gen) and message-observation integrator (Mess-obs Integrator) remain fixed during the policy training.
Figure 5: Performance of TACTIC and baseline models on policy generalizability across various sight ranges in the Protoss map.
...and 4 more figures

TACTIC: Task-Agnostic Contrastive pre-Training for Inter-Agent Communication

TL;DR

Abstract

TACTIC: Task-Agnostic Contrastive pre-Training for Inter-Agent Communication

Authors

TL;DR

Abstract

Table of Contents

Figures (9)