MambaBEV: An efficient 3D detection model with Mamba2

Zihan You; Ni Wang; Hao Wang; Qichao Zhao; Jinxiang Wang

MambaBEV: An efficient 3D detection model with Mamba2

Zihan You, Ni Wang, Hao Wang, Qichao Zhao, Jinxiang Wang

TL;DR

MambaBEV introduces a BEV-based 3D detection framework that leverages the Mamba2 structured state-space model for efficient long-range temporal fusion. The TemporalMamba block enables global BEV context integration by discrete BEV feature rearrangement and four-direction sequence processing, complemented by a Mamba-based DETR head for robust multi-object detection. On nuScenes, the base version achieves strong performance (NDS ≈ 51.7% and mAP ≈ 42.7%), with notable improvements in large-object detection and velocity estimation, and shows promise in end-to-end autonomous driving planning and forecasting tasks. Overall, the work demonstrates the viability of state-space models for autonomous driving perception, offering improved global context understanding and efficiency relative to traditional transformer-based temporal fusion methods.

Abstract

Accurate 3D object detection in autonomous driving relies on Bird's Eye View (BEV) perception and effective temporal fusion.However, existing fusion strategies based on convolutional layers or deformable self attention struggle with global context modeling in BEV space,leading to lower accuracy for large objects. To address this, we introduce MambaBEV, a novel BEV based 3D object detection model that leverages Mamba2, an advanced state space model (SSM) optimized for long sequence processing.Our key contribution is TemporalMamba, a temporal fusion module that enhances global awareness by introducing a BEV feature discrete rearrangement mechanism tailored for Mamba's sequential processing. Additionally, we propose Mamba based DETR as the detection head to improve multi object representation.Evaluations on the nuScenes dataset demonstrate that MambaBEV base achieves an NDS of 51.7\% and an mAP of 42.7\%.Furthermore, an end to end autonomous driving paradigm validates its effectiveness in motion forecasting and planning.Our results highlight the potential of SSMs in autonomous driving perception, particularly in enhancing global context understanding and large object detection.

MambaBEV: An efficient 3D detection model with Mamba2

TL;DR

Abstract

MambaBEV: An efficient 3D detection model with Mamba2

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)