Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning with Selective State-Space Models

Jemma Daniel, Ruan de Kock, Louay Ben Nessir, Sasha Abramowitz, Omayma Mahjoub, Wiem Khlifi, Claude Formanek, Arnu Pretorius

TL;DR

This work introduces Multi-Agent Mamba (MAM), which replaces attention in MAT with causal, bi-directional, and cross-attentional Mamba blocks, suggesting SSMs can replace attention-based architectures in MARL for better scalability.

Abstract

The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. However, a significant drawback of Transformer models is their quadratic computational complexity relative to input size, making them computationally expensive when scaling to larger inputs. This limitation restricts MAT's scalability in environments with many agents. Recently, State-Space Models (SSMs) have gained attention due to their computational efficiency, but their application in MARL remains unexplored. In this work, we investigate the use of Mamba, a recent SSM, in MARL and assess whether it can match the performance of MAT while providing significant improvements in efficiency. We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block. Extensive testing shows that our Multi-Agent Mamba (MAM) matches the performance of MAT across multiple standard multi-agent environments, while offering superior scalability to larger agent scenarios. This is significant for the MARL community, because it indicates that SSMs could replace Transformers without compromising performance, whilst also supporting more effective scaling to higher numbers of agents. Our project page is available at https://sites.google.com/view/multi-agent-mamba .

Multi-Agent Reinforcement Learning with Selective State-Space Models

TL;DR

This work introduces Multi-Agent Mamba (MAM), which replaces attention in MAT with causal, bi-directional, and cross-attentional Mamba blocks, suggesting SSMs can replace attention-based architectures in MARL for better scalability.

Abstract

The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. However, a significant drawback of Transformer models is their quadratic computational complexity relative to input size, making them computationally expensive when scaling to larger inputs. This limitation restricts MAT's scalability in environments with many agents. Recently, State-Space Models (SSMs) have gained attention due to their computational efficiency, but their application in MARL remains unexplored. In this work, we investigate the use of Mamba, a recent SSM, in MARL and assess whether it can match the performance of MAT while providing significant improvements in efficiency. We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block. Extensive testing shows that our Multi-Agent Mamba (MAM) matches the performance of MAT across multiple standard multi-agent environments, while offering superior scalability to larger agent scenarios. This is significant for the MARL community, because it indicates that SSMs could replace Transformers without compromising performance, whilst also supporting more effective scaling to higher numbers of agents. Our project page is available at https://sites.google.com/view/multi-agent-mamba .

Paper Structure

This paper contains 31 sections, 14 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Normalised mean episode returns aggregated over all tasks and environments with 95% confidence intervals for MAM, MAT, and MAPPO. Results are obtained using ten independent seeds. MAM matches the final performance of MAT, currently state-of-the-art in MARL, and exhibits greater sample efficiency.
  • Figure 2: Mean time (seconds) per evaluation step in smacv2 tasks with increasing numbers of agents for MAM, MAT, and MAPPO. The mean time per evaluation step for MAT increases approximately quadratically, while MAM and MAPPO scale linearly in the number of agents.
  • Figure 4: The vanilla and bi-directional Mamba modules used to replace causal and non-causal attention, respectively. Left: The original left-to-right causal Mamba module proposed by gu2024mambalineartimesequencemodeling. Right: A bi-directional extension of the Mamba module proposed by schiff2024caduceusbidirectionalequivariantlongrange.
  • Figure 5: The dependency chart of the SSM layer for a vanilla Mamba module (top) and a CrossMamba module (bottom) in a single timestep $t$. We use colour to highlight which input sequence each block's selective parameters depend upon.
  • Figure 6: Mean episode return over ten seeds for MAPPO, MAT and MAM with 95% confidence intervals agarwal2022deepreinforcementlearningedge for RWARE, SMAX and LBF tasks.
  • ...and 1 more figures