OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition
Qiuchi Xiang, Jintao Cheng, Jiehao Luo, Jin Wu, Rui Fan, Xieyuanli Chen, Xiaoyu Tang
TL;DR
OverlapMamba tackles robust, real-time LiDAR-based place recognition for SLAM by converting range views into directional sequences and applying a shift-state-space model with a random yaw reconstruction. The architecture combines an OverlapMamba backbone, a multi-directional OverlapMamba block, and a NetVLAD-based Global Descriptor Generator to produce yaw-invariant global descriptors from raw RVs. An ImTrihard triplet loss further enhances convergence and generalization. Across KITTI, Ford Campus, and NCLT, it achieves state-of-the-art loop closure and place recognition with significantly lower runtime than transformer-based approaches, demonstrating impactful real-time localization capabilities for autonomous systems.
Abstract
Place recognition is the foundation for enabling autonomous systems to achieve independent decision-making and safe operations. It is also crucial in tasks such as loop closure detection and global localization within SLAM. Previous methods utilize mundane point cloud representations as input and deep learning-based LiDAR-based Place Recognition (LPR) approaches employing different point cloud image inputs with convolutional neural networks (CNNs) or transformer architectures. However, the recently proposed Mamba deep learning model, combined with state space models (SSMs), holds great potential for long sequence modeling. Therefore, we developed OverlapMamba, a novel network for place recognition, which represents input range views (RVs) as sequences. In a novel way, we employ a stochastic reconstruction approach to build shift state space models, compressing the visual representation. Evaluated on three different public datasets, our method effectively detects loop closures, showing robustness even when traversing previously visited locations from different directions. Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed, indicating strong place recognition capabilities and real-time efficiency.
