Table of Contents
Fetching ...

MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning

Maciej Wojtala, Bogusz Stefańczyk, Dominik Bogucki, Łukasz Lepak, Jakub Strykowski, Paweł Wawrzyński

TL;DR

MACTAS presents a differentiable, Transformer-based inter-agent communication module for multi-agent reinforcement learning that can be plugged into any action-value decomposition. By processing a set of per-agent hidden states with a g module based on self-attention, MACTAS enables reward-driven message exchange without increasing the parameter count with the number of agents. Empirical results on SMAC and SMACv2 show MACTAS achieving state-of-the-art or competitive performance across multiple maps and mixers, with ablations highlighting the importance of residual connections and the proposed exploration strategy. The approach offers scalable communication in MARL and broad compatibility with existing value-decomposition frameworks, contributing a practical pathway toward better coordinated multi-agent policies.

Abstract

Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC and SMACv2 benchmarks demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.

MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning

TL;DR

MACTAS presents a differentiable, Transformer-based inter-agent communication module for multi-agent reinforcement learning that can be plugged into any action-value decomposition. By processing a set of per-agent hidden states with a g module based on self-attention, MACTAS enables reward-driven message exchange without increasing the parameter count with the number of agents. Empirical results on SMAC and SMACv2 show MACTAS achieving state-of-the-art or competitive performance across multiple maps and mixers, with ablations highlighting the importance of residual connections and the proposed exploration strategy. The approach offers scalable communication in MARL and broad compatibility with existing value-decomposition frameworks, contributing a practical pathway toward better coordinated multi-agent policies.

Abstract

Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC and SMACv2 benchmarks demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.

Paper Structure

This paper contains 30 sections, 1 equation, 20 figures, 4 tables.

Figures (20)

  • Figure 1: 3s5z_vs_3s6z scenario in the StarCraft Multi-Agent Challenge (SMAC).
  • Figure 2: The proposed MACTAS architecture.
  • Figure 3: Percentage of wins in test games in training time for the connection protocols MACTAS and MAIC, and the $Q$ architectures QMIX, QPLEX, and VDN.
  • Figure 4: Percentage of wins and standard deviations in test games in training time for the connection protocols MACTAS, MAIC, and the bare mixer for the $Q$ architecture QMIX
  • Figure 5: The impact of our exploration scheme on MACTAS+QMIX.
  • ...and 15 more figures