Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

Wall Kim; Chaeyoung Song; Hanul Kim

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

Wall Kim, Chaeyoung Song, Hanul Kim

TL;DR

A simple yet effective structure, called Decision MetaMamba (DMM), is proposed, which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information.

Abstract

Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

TL;DR

Abstract

Paper Structure (29 sections, 4 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 8 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Offline RL Decision Models
Information loss in Mamba and Transformer
Methodology
Preliminaries
Motivations
Decision MetaMamba
Dense Sequence Mixer
Decision MetaMamba Block
Experiments
Results on Dense Reward Environment
Datasets
Results
Results on Sparse Reward Environment
...and 14 more sections

Figures (8)

Figure 1: Comparison of model size and performance with recent transformer- and SSM-based methods: DC kim2024decision, DT chen2021decision, EDT wu2024elastic and DM ota2024decision. For hopper-md (medium), the x-axis is logarithmic; others use a linear scale.
Figure 2: Detailed model structure: (a) Mamba gu2023mamba, (b) Decision Mamba, and (c) Dense Sequence Mixer (DSM).
Figure 3: Heatmap visualization of the output tensor activation. The y-axis denotes step components in the repeating order of state, action, and $rtg$, starting from index 0. The x-axis corresponds to the embedding dimension. Left: output from Mamba gu2023mamba. Right: output from DSM + modified Mamba.
Figure 4: Average gradient norms of 24 input features on hopper-md (left) and antmaze-um (right). As the x index increases, the distance from the current step becomes greater, and the sequence is shown in the order of state, action, and $rtg$. Red dots indicate results from DMM, and blue dots from Mamba gu2023mamba. Each dataset includes two batches of 64 samples.
Figure 5: The chart plots evaluation score against context length, with context length on the x-axis and evaluation score on the y-axis. DMM achieves the highest performance compared to other models when using a shorter input context length.
...and 3 more figures

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

TL;DR

Abstract

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)