Table of Contents
Fetching ...

Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces

Toshihiro Ota

TL;DR

The paper addresses improving Transformer-based reinforcement learning by substituting self-attention with Mamba's selective state-space modeling to better capture long-range dependencies in sequential decision tasks. It introduces Decision Mamba, a DT-type model that processes the recent trajectory through a Mamba block, using return-to-go conditioned autoregression for action generation. Across OpenAI Gym and Atari benchmarks, DMamba achieves competitive performance against DT, DS4, and DC, with ablations offering insights into the role of the Mamba components and context length. The work highlights the potential of architectural choices for RL sequence modeling and lays groundwork for future efficiency-oriented adaptations and data-structure-aware designs.

Abstract

Decision Transformer, a promising approach that applies Transformer architectures to reinforcement learning, relies on causal self-attention to model sequences of states, actions, and rewards. While this method has shown competitive results, this paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture, focusing on the potential performance enhancements in sequential decision-making tasks. Our study systematically evaluates this integration by conducting a series of experiments across various decision-making environments, comparing the modified Decision Transformer, Decision Mamba, with its traditional counterpart. This work contributes to the advancement of sequential decision-making models, suggesting that the architecture and training methodology of neural networks can significantly impact their performance in complex tasks, and highlighting the potential of Mamba as a valuable tool for improving the efficacy of Transformer-based models in reinforcement learning scenarios.

Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces

TL;DR

The paper addresses improving Transformer-based reinforcement learning by substituting self-attention with Mamba's selective state-space modeling to better capture long-range dependencies in sequential decision tasks. It introduces Decision Mamba, a DT-type model that processes the recent trajectory through a Mamba block, using return-to-go conditioned autoregression for action generation. Across OpenAI Gym and Atari benchmarks, DMamba achieves competitive performance against DT, DS4, and DC, with ablations offering insights into the role of the Mamba components and context length. The work highlights the potential of architectural choices for RL sequence modeling and lays groundwork for future efficiency-oriented adaptations and data-structure-aware designs.

Abstract

Decision Transformer, a promising approach that applies Transformer architectures to reinforcement learning, relies on causal self-attention to model sequences of states, actions, and rewards. While this method has shown competitive results, this paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture, focusing on the potential performance enhancements in sequential decision-making tasks. Our study systematically evaluates this integration by conducting a series of experiments across various decision-making environments, comparing the modified Decision Transformer, Decision Mamba, with its traditional counterpart. This work contributes to the advancement of sequential decision-making models, suggesting that the architecture and training methodology of neural networks can significantly impact their performance in complex tasks, and highlighting the potential of Mamba as a valuable tool for improving the efficacy of Transformer-based models in reinforcement learning scenarios.
Paper Structure (15 sections, 9 equations, 1 figure, 7 tables, 2 algorithms)

This paper contains 15 sections, 9 equations, 1 figure, 7 tables, 2 algorithms.

Figures (1)

  • Figure 1: Overview of the Mamba layer. $\sigma$ is an activation function, for which we use the $\mathop{\mathrm{SiLU}}\nolimits$ function, and $\odot$ denotes the element-wise product.