LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition
Zijie Zhou, Jingyi Xu, Guangming Xiong, Junyi Ma
TL;DR
LCPR tackles place recognition in GPS-denied settings by fusing LiDAR range images with multi-view RGB imagery to produce yaw-rotation invariant, discriminative global descriptors. It introduces a Vertically Compressed Transformer Fusion module that fuses features across scales and modalities, complemented by residual encoders and NetVLAD-MLP aggregations to generate compact descriptors. The approach achieves state-of-the-art performance on nuScenes, demonstrates robustness to occlusion and lighting changes, and preserves real-time inference capabilities suitable for in-vehicle deployment. The work advances multimodal place recognition by exploiting panoramic views and cross-modal attention, with practical implications for reliable loop closure and global localization in autonomous driving.
Abstract
Place recognition is one of the most crucial modules for autonomous vehicles to identify places that were previously visited in GPS-invalid environments. Sensor fusion is considered an effective method to overcome the weaknesses of individual sensors. In recent years, multimodal place recognition fusing information from multiple sensors has gathered increasing attention. However, most existing multimodal place recognition methods only use limited field-of-view camera images, which leads to an imbalance between features from different modalities and limits the effectiveness of sensor fusion. In this paper, we present a novel neural network named LCPR for robust multimodal place recognition, which fuses LiDAR point clouds with multi-view RGB images to generate discriminative and yaw-rotation invariant representations of the environment. A multi-scale attention-based fusion module is proposed to fully exploit the panoramic views from different modalities of the environment and their correlations. We evaluate our method on the nuScenes dataset, and the experimental results show that our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance while maintaining strong robustness to viewpoint changes. Our open-source code and pre-trained models are available at https://github.com/ZhouZijie77/LCPR .
