Table of Contents
Fetching ...

Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

Changwei Wang, Shunpeng Chen, Yukun Song, Rongtao Xu, Zherui Zhang, Jiguang Zhang, Haoran Yang, Yu Zhang, Kexue Fu, Shide Du, Zhiwei Xu, Longxiang Gao, Li Guo, Shibiao Xu

TL;DR

This work tackles Visual Place Recognition by shifting focus toward reliable discriminative local regions. It introduces FoL, a two-stage VPR framework that learns to mine and exploit these regions via Extraction-Aggregation Spatial Alignment (SAL) and Foreground-Background Contrast Enhancement (CEL) losses, alongside a weakly supervised pseudo-correspondence strategy. A discriminative region mask guides an efficient re-ranking pipeline, enabling accurate local matching with reduced computation. Empirical results on diverse benchmarks demonstrate state-of-the-art performance for both retrieval and re-ranking, with substantial efficiency gains compared to prior two-stage methods, highlighting FoL's practical impact for scalable and robust VPR.

Abstract

Visual Place Recognition (VPR) is aimed at predicting the location of a query image by referencing a database of geotagged images. For VPR task, often fewer discriminative local regions in an image produce important effects while mundane background regions do not contribute or even cause perceptual aliasing because of easy overlap. However, existing methods lack precisely modeling and full exploitation of these discriminative regions. In this paper, we propose the Focus on Local (FoL) approach to stimulate the performance of image retrieval and re-ranking in VPR simultaneously by mining and exploiting reliable discriminative local regions in images and introducing pseudo-correlation supervision. First, we design two losses, Extraction-Aggregation Spatial Alignment Loss (SAL) and Foreground-Background Contrast Enhancement Loss (CEL), to explicitly model reliable discriminative local regions and use them to guide the generation of global representations and efficient re-ranking. Second, we introduce a weakly-supervised local feature training strategy based on pseudo-correspondences obtained from aggregating global features to alleviate the lack of local correspondences ground truth for the VPR task. Third, we suggest an efficient re-ranking pipeline that is efficiently and precisely based on discriminative region guidance. Finally, experimental results show that our FoL achieves the state-of-the-art on multiple VPR benchmarks in both image retrieval and re-ranking stages and also significantly outperforms existing two-stage VPR methods in terms of computational efficiency. Code and models are available at https://github.com/chenshunpeng/FoL

Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

TL;DR

This work tackles Visual Place Recognition by shifting focus toward reliable discriminative local regions. It introduces FoL, a two-stage VPR framework that learns to mine and exploit these regions via Extraction-Aggregation Spatial Alignment (SAL) and Foreground-Background Contrast Enhancement (CEL) losses, alongside a weakly supervised pseudo-correspondence strategy. A discriminative region mask guides an efficient re-ranking pipeline, enabling accurate local matching with reduced computation. Empirical results on diverse benchmarks demonstrate state-of-the-art performance for both retrieval and re-ranking, with substantial efficiency gains compared to prior two-stage methods, highlighting FoL's practical impact for scalable and robust VPR.

Abstract

Visual Place Recognition (VPR) is aimed at predicting the location of a query image by referencing a database of geotagged images. For VPR task, often fewer discriminative local regions in an image produce important effects while mundane background regions do not contribute or even cause perceptual aliasing because of easy overlap. However, existing methods lack precisely modeling and full exploitation of these discriminative regions. In this paper, we propose the Focus on Local (FoL) approach to stimulate the performance of image retrieval and re-ranking in VPR simultaneously by mining and exploiting reliable discriminative local regions in images and introducing pseudo-correlation supervision. First, we design two losses, Extraction-Aggregation Spatial Alignment Loss (SAL) and Foreground-Background Contrast Enhancement Loss (CEL), to explicitly model reliable discriminative local regions and use them to guide the generation of global representations and efficient re-ranking. Second, we introduce a weakly-supervised local feature training strategy based on pseudo-correspondences obtained from aggregating global features to alleviate the lack of local correspondences ground truth for the VPR task. Third, we suggest an efficient re-ranking pipeline that is efficiently and precisely based on discriminative region guidance. Finally, experimental results show that our FoL achieves the state-of-the-art on multiple VPR benchmarks in both image retrieval and re-ranking stages and also significantly outperforms existing two-stage VPR methods in terms of computational efficiency. Code and models are available at https://github.com/chenshunpeng/FoL

Paper Structure

This paper contains 19 sections, 17 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) The results show that our FoL achieves state-of-the-art performance in the image retrieval phase alone and significantly outperforms recent methods after re-ranking. (b) Our FoL proposes to do local matching only in the discriminative region, which not only improves the accuracy of re-ranking but also greatly improves the efficiency.
  • Figure 2: Illustration of our FoL's pipeline during training. In addition to including common feature extraction and global feature aggregation steps, our FoL also includes Reliable discriminative region modeling and weakly supervised local feature learning to fully introduce spatially localized information to simultaneously improve the performance of the two-stage VPR method for image retrieval and re-ranking.
  • Figure 3: Visualization of local feature matching in the re-ranking stage. (a) shows the w/o Discriminative Region Guidance, while (b) displays w/ Discriminative Region Guidance. Red $\color{red}\bullet$, Blue $\color{blue}\bullet$, and Green $\color{green}\bullet$ represent low, medium, and high similarity matching points, respectively (${ \color{red}\bullet} \rightarrow { \color{blue}\bullet} \rightarrow { \color{green}\bullet}$ indicates increasing similarity).
  • Figure 4: Qualitative VPR comparison results. Our FoL accurately matching the query images, while other methods such as SALAD, CricaVPR, and EigenPlaces cause erroneous matches under complex lighting and viewpoint variations.
  • Figure 5: Ablation study of the parameter $k$. The best performance is achieved when $k$ is set to the top 40% of the values in $\mathbf{M}$.
  • ...and 1 more figures