MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning
Maciej Wojtala, Bogusz Stefańczyk, Dominik Bogucki, Łukasz Lepak, Jakub Strykowski, Paweł Wawrzyński
TL;DR
MACTAS presents a differentiable, Transformer-based inter-agent communication module for multi-agent reinforcement learning that can be plugged into any action-value decomposition. By processing a set of per-agent hidden states with a g module based on self-attention, MACTAS enables reward-driven message exchange without increasing the parameter count with the number of agents. Empirical results on SMAC and SMACv2 show MACTAS achieving state-of-the-art or competitive performance across multiple maps and mixers, with ablations highlighting the importance of residual connections and the proposed exploration strategy. The approach offers scalable communication in MARL and broad compatibility with existing value-decomposition frameworks, contributing a practical pathway toward better coordinated multi-agent policies.
Abstract
Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC and SMACv2 benchmarks demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.
