Table of Contents
Fetching ...

Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Jiahang Cao, Qiang Zhang, Jingkai Sun, Jiaxu Wang, Hao Cheng, Yulin Li, Jun Ma, Kun Wu, Zhiyuan Xu, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

TL;DR

Diffusion-based policies for 3D manipulation are powerful but computationally expensive due to large backbone networks. The authors propose the Mamba Policy, a lightweight diffusion-based policy built on XMamba blocks that integrate Selective State Space Models with attention to improve long-horizon performance while greatly reducing parameters. They validate on Adroit, DexArt, and MetaWorld, plus real-world experiments, showing superior SR metrics and substantial efficiency gains, accompanied by comprehensive ablations and horizon-length analyses. The work enables efficient deployment of diffusion-based manipulation on resource-constrained hardware and provides open-source tooling for the community.

Abstract

Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Real-world experiments are also conducted to further validate its effectiveness. Our open-source project page can be found at https://andycao1125.github.io/mamba_policy/.

Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

TL;DR

Diffusion-based policies for 3D manipulation are powerful but computationally expensive due to large backbone networks. The authors propose the Mamba Policy, a lightweight diffusion-based policy built on XMamba blocks that integrate Selective State Space Models with attention to improve long-horizon performance while greatly reducing parameters. They validate on Adroit, DexArt, and MetaWorld, plus real-world experiments, showing superior SR metrics and substantial efficiency gains, accompanied by comprehensive ablations and horizon-length analyses. The work enables efficient deployment of diffusion-based manipulation on resource-constrained hardware and provides open-source tooling for the community.

Abstract

Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Real-world experiments are also conducted to further validate its effectiveness. Our open-source project page can be found at https://andycao1125.github.io/mamba_policy/.
Paper Structure (17 sections, 11 equations, 6 figures, 4 tables)

This paper contains 17 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison with the SOTA baselines regarding accuracy and computational usage. Our proposed Mamba Policy (a) achieves superior success rates and (b) offers up to $90\%$ computational savings in terms of floating point operations (FLOPs).
  • Figure 2: Overview of Mamba Policy. Our proposed model takes the noised action and the condition as inputs, the latter of which is composed of three parts: point cloud perception embedding, robot state embedding, and time embedding. Each of these components is processed through its respective encoder $\Phi_{\text{type}}$. The X-Mamba UNet is then employed to process these inputs and ultimately return the predicted noise, with XMamba blocks serving as a key role. During training, the model is updated using MSE loss (Eq. \ref{['eq:loss']}) with the label noise. For validation, the model leverages DDIM to reconstruct the original action, which is then used to interact with the environment and execute different tasks.
  • Figure 3: Visualization of our manipulation results. We conduct experiments on three datasets, including Adorit, MetaWorld, and DexArt. Here we illustrate the results in Adroit Door (top), DexArt Bucket (middle), and MetaWorld Assembly (bottom). During the interaction, our proposed Mamba Policy outputs future execution actions until the task is successfully implemented.
  • Figure 4: Visualization of Success Rates and Training Curves. We visualize the comparisons in terms of different highest $K$ average of success rates, where our proposed Mamba Policy achieves superior results. The stable training curves also demonstrate the effectiveness of our model.
  • Figure 5: Ablation study on different horizon length. To validate the ability to process longer historical dependencies, we conduct experiments with various horizon lengths and our Mamba Policy achieved robust accuracy and reduced GPU usage compared with DP3, demonstrating the effectiveness and efficiency of our method under long-term scenarios.
  • ...and 1 more figures