Table of Contents
Fetching ...

Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning

Changxi Zhu, Mehdi Dastani, Shihan Wang

TL;DR

This paper tackles learning communication in multi-task MADRL by introducing Multi-task Communication Skills (MCS), which encodes task-specific observations into a shared message space using a Transformer and employs a pruning mechanism plus a training-time prediction network to align messages with sender actions. The method optimizes a joint objective under the CTDE paradigm, incorporating a variational mutual information term to encourage informative communication across tasks with varying agent counts and observation/action spaces. Empirical results across AliceBob, SMAC, and Football show that MCS outperforms multi-task baselines without communication and single-task baselines with/without communication, with ablations confirming the value of message pruning and the predictor. The work demonstrates robust cross-task coordination, interpretable latent message structures, and a pathway toward adaptive, task-aware communication in complex, partially observable multi-agent domains.

Abstract

In multi-agent deep reinforcement learning (MADRL), agents can communicate with one another to perform a task in a coordinated manner. When multiple tasks are involved, agents can also leverage knowledge from one task to improve learning in other tasks. In this paper, we propose Multi-task Communication Skills (MCS), a MADRL with communication method that learns and performs multiple tasks simultaneously, with agents interacting through learnable communication protocols. MCS employs a Transformer encoder to encode task-specific observations into a shared message space, capturing shared communication skills among agents. To enhance coordination among agents, we introduce a prediction network that correlates messages with the actions of sender agents in each task. We adapt three multi-agent benchmark environments to multi-task settings, where the number of agents as well as the observation and action spaces vary across tasks. Experimental results demonstrate that MCS achieves better performance than multi-task MADRL baselines without communication, as well as single-task MADRL baselines with and without communication.

Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning

TL;DR

This paper tackles learning communication in multi-task MADRL by introducing Multi-task Communication Skills (MCS), which encodes task-specific observations into a shared message space using a Transformer and employs a pruning mechanism plus a training-time prediction network to align messages with sender actions. The method optimizes a joint objective under the CTDE paradigm, incorporating a variational mutual information term to encourage informative communication across tasks with varying agent counts and observation/action spaces. Empirical results across AliceBob, SMAC, and Football show that MCS outperforms multi-task baselines without communication and single-task baselines with/without communication, with ablations confirming the value of message pruning and the predictor. The work demonstrates robust cross-task coordination, interpretable latent message structures, and a pathway toward adaptive, task-aware communication in complex, partially observable multi-agent domains.

Abstract

In multi-agent deep reinforcement learning (MADRL), agents can communicate with one another to perform a task in a coordinated manner. When multiple tasks are involved, agents can also leverage knowledge from one task to improve learning in other tasks. In this paper, we propose Multi-task Communication Skills (MCS), a MADRL with communication method that learns and performs multiple tasks simultaneously, with agents interacting through learnable communication protocols. MCS employs a Transformer encoder to encode task-specific observations into a shared message space, capturing shared communication skills among agents. To enhance coordination among agents, we introduce a prediction network that correlates messages with the actions of sender agents in each task. We adapt three multi-agent benchmark environments to multi-task settings, where the number of agents as well as the observation and action spaces vary across tasks. Experimental results demonstrate that MCS achieves better performance than multi-task MADRL baselines without communication, as well as single-task MADRL baselines with and without communication.

Paper Structure

This paper contains 20 sections, 11 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: An overview of the MCS architecture. Agents communicate and act in each task, and data from multiple tasks are used to train a shared model across tasks.
  • Figure 2: The network structure of MCS. Task-specific observations $\boldsymbol{o}^k$ are represented in an entity-based form and then encoded into messages $\boldsymbol{m}^k$. During communication, messages are pruned using masks $\boldsymbol{C}^k$, applied through column-wise multiplication. Then, messages are aggregated and integrated into the policy network.
  • Figure 3: The network structure diagram of the predictor.
  • Figure 4:
  • Figure 5: Ablation studies of MCS on AliceBob (a–b), SMAC (c–d), and Football (e).
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1