Table of Contents
Fetching ...

A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation

Chencan Fu, Lin Li, Jianbiao Mei, Yukai Ma, Linpeng Peng, Xiangrui Zhao, Yong Liu

TL;DR

This work tackles LiDAR-based place recognition by mitigating two core issues: descriptor expressiveness and the computational burden of exhaustive pairwise comparison. It introduces a coarse-to-fine pipeline that first builds BEV-based features and attention-guided global descriptors to rapidly shortlist Top-K loop candidates, then applies a cross-attention overlap estimator to select the best match among them. The approach leverages a shared BEV feature space to unify coarse matching and fine verification, achieving state-of-the-art or competitive results on KITTI and KITTI-360 while significantly reducing the number of expensive overlap estimations. The findings demonstrate strong robustness to challenging conditions and reverse loops, with practical runtime advantages for loop-closure in SLAM systems. Future work targets reducing BEV memory overhead and further optimizing end-to-end efficiency while preserving recognition accuracy.

Abstract

Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird's Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.

A Coarse-to-Fine Place Recognition Approach using Attention-guided Descriptors and Overlap Estimation

TL;DR

This work tackles LiDAR-based place recognition by mitigating two core issues: descriptor expressiveness and the computational burden of exhaustive pairwise comparison. It introduces a coarse-to-fine pipeline that first builds BEV-based features and attention-guided global descriptors to rapidly shortlist Top-K loop candidates, then applies a cross-attention overlap estimator to select the best match among them. The approach leverages a shared BEV feature space to unify coarse matching and fine verification, achieving state-of-the-art or competitive results on KITTI and KITTI-360 while significantly reducing the number of expensive overlap estimations. The findings demonstrate strong robustness to challenging conditions and reverse loops, with practical runtime advantages for loop-closure in SLAM systems. Future work targets reducing BEV memory overhead and further optimizing end-to-end efficiency while preserving recognition accuracy.

Abstract

Place recognition is a challenging but crucial task in robotics. Current description-based methods may be limited by representation capabilities, while pairwise similarity-based methods require exhaustive searches, which is time-consuming. In this paper, we present a novel coarse-to-fine approach to address these problems, which combines BEV (Bird's Eye View) feature extraction, coarse-grained matching and fine-grained verification. In the coarse stage, our approach utilizes an attention-guided network to generate attention-guided descriptors. We then employ a fast affinity-based candidate selection process to identify the Top-K most similar candidates. In the fine stage, we estimate pairwise overlap among the narrowed-down place candidates to determine the final match. Experimental results on the KITTI and KITTI-360 datasets demonstrate that our approach outperforms state-of-the-art methods. The code will be released publicly soon.
Paper Structure (14 sections, 8 equations, 5 figures, 6 tables)

This paper contains 14 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The visualization of results. The trajectory represents sequence 08 of the KITTI datasetgeiger2013ijrr, and the color indicates the Euclidean distance of the descriptors to the query. The circle mark represents a query place, and the cross mark represents the matching place found by our approach.
  • Figure 2: The pipeline of our approach. The point clouds are first converted to voxel representations and fed into the encoder network to extract BEV features. Then, these features are used to generate global descriptors. In the coarse phase, the Top-K place candidates are selected by affinity-based selection. Then, in the fine phase, the corresponding BEV features are used pairwise to estimate the overlap region between the query scan and the candidates to find the final match.
  • Figure 3: The figure above depicts the process of overlap estimation, where the top row corresponds to the query scan, and the bottom row corresponds to the candidate. Panel (a) shows the input pairwise point clouds. The yellow region in panel (b) represents the ground truth overlap region and panel (c) shows the predicted overlap region.
  • Figure 4: Recall@N on KITTI and KITTI-360 datasets.
  • Figure 5: AR@N of ours-CF on KITTI dataset.