Table of Contents
Fetching ...

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

Wall Kim, Chaeyoung Song, Hanul Kim

TL;DR

A simple yet effective structure, called Decision MetaMamba (DMM), is proposed, which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information.

Abstract

Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

TL;DR

A simple yet effective structure, called Decision MetaMamba (DMM), is proposed, which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information.

Abstract

Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.
Paper Structure (29 sections, 4 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of model size and performance with recent transformer- and SSM-based methods: DC kim2024decision, DT chen2021decision, EDT wu2024elastic and DM ota2024decision. For hopper-md (medium), the x-axis is logarithmic; others use a linear scale.
  • Figure 2: Detailed model structure: (a) Mamba gu2023mamba, (b) Decision Mamba, and (c) Dense Sequence Mixer (DSM).
  • Figure 3: Heatmap visualization of the output tensor activation. The y-axis denotes step components in the repeating order of state, action, and $rtg$, starting from index 0. The x-axis corresponds to the embedding dimension. Left: output from Mamba gu2023mamba. Right: output from DSM + modified Mamba.
  • Figure 4: Average gradient norms of 24 input features on hopper-md (left) and antmaze-um (right). As the x index increases, the distance from the current step becomes greater, and the sequence is shown in the order of state, action, and $rtg$. Red dots indicate results from DMM, and blue dots from Mamba gu2023mamba. Each dataset includes two batches of 64 samples.
  • Figure 5: The chart plots evaluation score against context length, with context length on the x-axis and evaluation score on the y-axis. DMM achieves the highest performance compared to other models when using a shorter input context length.
  • ...and 3 more figures