Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning
Changxi Zhu, Mehdi Dastani, Shihan Wang
TL;DR
This paper tackles learning communication in multi-task MADRL by introducing Multi-task Communication Skills (MCS), which encodes task-specific observations into a shared message space using a Transformer and employs a pruning mechanism plus a training-time prediction network to align messages with sender actions. The method optimizes a joint objective under the CTDE paradigm, incorporating a variational mutual information term to encourage informative communication across tasks with varying agent counts and observation/action spaces. Empirical results across AliceBob, SMAC, and Football show that MCS outperforms multi-task baselines without communication and single-task baselines with/without communication, with ablations confirming the value of message pruning and the predictor. The work demonstrates robust cross-task coordination, interpretable latent message structures, and a pathway toward adaptive, task-aware communication in complex, partially observable multi-agent domains.
Abstract
In multi-agent deep reinforcement learning (MADRL), agents can communicate with one another to perform a task in a coordinated manner. When multiple tasks are involved, agents can also leverage knowledge from one task to improve learning in other tasks. In this paper, we propose Multi-task Communication Skills (MCS), a MADRL with communication method that learns and performs multiple tasks simultaneously, with agents interacting through learnable communication protocols. MCS employs a Transformer encoder to encode task-specific observations into a shared message space, capturing shared communication skills among agents. To enhance coordination among agents, we introduce a prediction network that correlates messages with the actions of sender agents in each task. We adapt three multi-agent benchmark environments to multi-task settings, where the number of agents as well as the observation and action spaces vary across tasks. Experimental results demonstrate that MCS achieves better performance than multi-task MADRL baselines without communication, as well as single-task MADRL baselines with and without communication.
