Table of Contents
Fetching ...

PRFusion: Toward Effective and Robust Multi-Modal Place Recognition with Image and Point Cloud Fusion

Sijie Wang, Qiyu Kang, Rui She, Kai Zhao, Yang Song, Wee Peng Tay

TL;DR

This work proposes two multi-modal place recognition models, namely PRFusion and PRFusion++, which outperform existing models by a substantial margin and incorporate neural diffusion layers, which enable reliable operation even in challenging environments.

Abstract

Place recognition plays a crucial role in the fields of robotics and computer vision, finding applications in areas such as autonomous driving, mapping, and localization. Place recognition identifies a place using query sensor data and a known database. One of the main challenges is to develop a model that can deliver accurate results while being robust to environmental variations. We propose two multi-modal place recognition models, namely PRFusion and PRFusion++. PRFusion utilizes global fusion with manifold metric attention, enabling effective interaction between features without requiring camera-LiDAR extrinsic calibrations. In contrast, PRFusion++ assumes the availability of extrinsic calibrations and leverages pixel-point correspondences to enhance feature learning on local windows. Additionally, both models incorporate neural diffusion layers, which enable reliable operation even in challenging environments. We verify the state-of-the-art performance of both models on three large-scale benchmarks. Notably, they outperform existing models by a substantial margin of +3.0 AR@1 on the demanding Boreas dataset. Furthermore, we conduct ablation studies to validate the effectiveness of our proposed methods. The codes are available at: https://github.com/sijieaaa/PRFusion

PRFusion: Toward Effective and Robust Multi-Modal Place Recognition with Image and Point Cloud Fusion

TL;DR

This work proposes two multi-modal place recognition models, namely PRFusion and PRFusion++, which outperform existing models by a substantial margin and incorporate neural diffusion layers, which enable reliable operation even in challenging environments.

Abstract

Place recognition plays a crucial role in the fields of robotics and computer vision, finding applications in areas such as autonomous driving, mapping, and localization. Place recognition identifies a place using query sensor data and a known database. One of the main challenges is to develop a model that can deliver accurate results while being robust to environmental variations. We propose two multi-modal place recognition models, namely PRFusion and PRFusion++. PRFusion utilizes global fusion with manifold metric attention, enabling effective interaction between features without requiring camera-LiDAR extrinsic calibrations. In contrast, PRFusion++ assumes the availability of extrinsic calibrations and leverages pixel-point correspondences to enhance feature learning on local windows. Additionally, both models incorporate neural diffusion layers, which enable reliable operation even in challenging environments. We verify the state-of-the-art performance of both models on three large-scale benchmarks. Notably, they outperform existing models by a substantial margin of +3.0 AR@1 on the demanding Boreas dataset. Furthermore, we conduct ablation studies to validate the effectiveness of our proposed methods. The codes are available at: https://github.com/sijieaaa/PRFusion
Paper Structure (38 sections, 15 equations, 8 figures, 10 tables)

This paper contains 38 sections, 15 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Our multi-modal PR pipeline. The query place is recognized by computing a scene descriptor based on both 2D and 3D features using our proposed PR model and then comparing it with the database descriptors. Our proposed model consists of global local feature fusion and neural Beltrami diffusion.
  • Figure 2: The overall architecture of our proposed PRFusion and PRFusion++. The multi-modal fusion is conducted in both the GFM and the LFM. The image features are additionally passed through the NDM to enhance the feature robustness.
  • Figure 3: Examples from the Oxford, KITTI, and Boreas datasets.
  • Figure 4: Average Recall@$N$ curve on the KITTI dataset.
  • Figure 5: AR@1 under different positive retrieval thresholds on the KITTI dataset.
  • ...and 3 more figures