MambaTrack3D: A State Space Model Framework for LiDAR-Based Object Tracking under High Temporal Variation

Shengjing Tian; Yinan Han; Xiantong Zhao; Xuehu Liu; Qi Lang

MambaTrack3D: A State Space Model Framework for LiDAR-Based Object Tracking under High Temporal Variation

Shengjing Tian, Yinan Han, Xiantong Zhao, Xuehu Liu, Qi Lang

TL;DR

This work targets 3D LiDAR visual object tracking under high temporal variation (HTV), where traditional memory-based trackers suffer quadratic complexity and temporal redundancy. It introduces MambaTrack3D, which combines a Mamba-based Inter-frame Propagation (MIP) module for near-linear, geometry-aware feature propagation with a Grouped Feature Enhancement Module (GFEM) to separate foreground and background semantics and reduce redundant memory. The approach achieves strong HTV performance on KITTI-HTV and nuScenes-HTV while preserving competitive accuracy in standard tracking, and it runs at real-time speeds thanks to the linear-time state-space modeling. The results demonstrate a favorable accuracy–efficiency trade-off and robust generalization to conventional tracking, making it suitable for real-world autonomous perception systems.

Abstract

Dynamic outdoor environments with high temporal variation (HTV) pose significant challenges for 3D single object tracking in LiDAR point clouds. Existing memory-based trackers often suffer from quadratic computational complexity, temporal redundancy, and insufficient exploitation of geometric priors. To address these issues, we propose MambaTrack3D, a novel HTV-oriented tracking framework built upon the state space model Mamba. Specifically, we design a Mamba-based Inter-frame Propagation (MIP) module that replaces conventional single-frame feature extraction with efficient inter-frame propagation, achieving near-linear complexity while explicitly modeling spatial relations across historical frames. Furthermore, a Grouped Feature Enhancement Module (GFEM) is introduced to separate foreground and background semantics at the channel level, thereby mitigating temporal redundancy in the memory bank. Extensive experiments on KITTI-HTV and nuScenes-HTV benchmarks demonstrate that MambaTrack3D consistently outperforms both HTV-oriented and normal-scenario trackers, achieving improvements of up to 6.5 success and 9.5 precision over HVTrack under moderate temporal gaps. On the standard KITTI dataset, MambaTrack3D remains highly competitive with state-of-the-art normal-scenario trackers, confirming its strong generalization ability. Overall, MambaTrack3D achieves a superior accuracy-efficiency trade-off, delivering robust performance across both specialized HTV and conventional tracking scenarios.

MambaTrack3D: A State Space Model Framework for LiDAR-Based Object Tracking under High Temporal Variation

TL;DR

Abstract

MambaTrack3D: A State Space Model Framework for LiDAR-Based Object Tracking under High Temporal Variation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)