Table of Contents
Fetching ...

Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition

Sergio Izquierdo, Javier Civera

TL;DR

This work identifies Geographic Distance Sensitivity (GDS) as a key weakness in modern VPR embeddings, where close geographic distances are not consistently reflected in descriptor distances, leading to incorrect top-k rankings. It introduces CliqueMining, a graph-based batch mining strategy that constructs dense batches of visually similar places within small geographic distances by sampling cliques in a distance-threshold graph, and integrates it with the Multi-Similarity loss to explicitly boost GDS. The approach yields substantial recall gains on dense benchmarks like Nordland (recall@1 exceeding 90%) and MSLS Challenge (notably +7.7% at recall@1), demonstrating that targeted, hard-batch mining can outperform more traditional two-stage re-ranking, while adding only modest training overhead. The method is particularly effective in densely sampled data, highlighting its potential to improve real-world VPR systems used in urban mapping and autonomous navigation, though its benefits depend on the presence of GDS issues and may be less impactful on datasets with limited viewpoint diversity.

Abstract

Visual Place Recognition (VPR) plays a critical role in many localization and mapping pipelines. It consists of retrieving the closest sample to a query image, in a certain embedding space, from a database of geotagged references. The image embedding is learned to effectively describe a place despite variations in visual appearance, viewpoint, and geometric changes. In this work, we formulate how limitations in the Geographic Distance Sensitivity of current VPR embeddings result in a high probability of incorrectly sorting the top-k retrievals, negatively impacting the recall. In order to address this issue in single-stage VPR, we propose a novel mining strategy, CliqueMining, that selects positive and negative examples by sampling cliques from a graph of visually similar images. Our approach boosts the sensitivity of VPR embeddings at small distance ranges, significantly improving the state of the art on relevant benchmarks. In particular, we raise recall@1 from 75% to 82% in MSLS Challenge, and from 76% to 90% in Nordland. Models and code are available at https://github.com/serizba/cliquemining.

Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition

TL;DR

This work identifies Geographic Distance Sensitivity (GDS) as a key weakness in modern VPR embeddings, where close geographic distances are not consistently reflected in descriptor distances, leading to incorrect top-k rankings. It introduces CliqueMining, a graph-based batch mining strategy that constructs dense batches of visually similar places within small geographic distances by sampling cliques in a distance-threshold graph, and integrates it with the Multi-Similarity loss to explicitly boost GDS. The approach yields substantial recall gains on dense benchmarks like Nordland (recall@1 exceeding 90%) and MSLS Challenge (notably +7.7% at recall@1), demonstrating that targeted, hard-batch mining can outperform more traditional two-stage re-ranking, while adding only modest training overhead. The method is particularly effective in densely sampled data, highlighting its potential to improve real-world VPR systems used in urban mapping and autonomous navigation, though its benefits depend on the presence of GDS issues and may be less impactful on datasets with limited viewpoint diversity.

Abstract

Visual Place Recognition (VPR) plays a critical role in many localization and mapping pipelines. It consists of retrieving the closest sample to a query image, in a certain embedding space, from a database of geotagged references. The image embedding is learned to effectively describe a place despite variations in visual appearance, viewpoint, and geometric changes. In this work, we formulate how limitations in the Geographic Distance Sensitivity of current VPR embeddings result in a high probability of incorrectly sorting the top-k retrievals, negatively impacting the recall. In order to address this issue in single-stage VPR, we propose a novel mining strategy, CliqueMining, that selects positive and negative examples by sampling cliques from a graph of visually similar images. Our approach boosts the sensitivity of VPR embeddings at small distance ranges, significantly improving the state of the art on relevant benchmarks. In particular, we raise recall@1 from 75% to 82% in MSLS Challenge, and from 76% to 90% in Nordland. Models and code are available at https://github.com/serizba/cliquemining.
Paper Structure (13 sections, 4 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 13 sections, 4 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Geographic Distance Sensitivity (GDS). We illustrate a typical case of top-$5$ retrieval without (left) and with (right) our proposed CliqueMining. Note how retrievals on the left are not properly sorted based on geographic distance, impacting the recall for the selected threshold (green circle). We conceptualize this effect as GDS in the central plot, which shows the distribution of descriptor distances against geographic distances. A low slope of the mean (orange line) and a high dispersion (orange area), indicative of low GDS, raise the probability of an incorrect order. To address this, we present CliqueMining, a novel batch selection pipeline that increases the GDS of a model (blue line and area) and produces more correct retrievals.
  • Figure 2: Top-$5$ retrievals for DINOv2-SALAD izquierdo2024optimal without and with our CliqueMining in MSLS warburg2020mapillary and Nordland nordland. Green frames represent correct retrievals and red frames incorrect ones, under the standard $25$-meters (1 frame for Nordland) decision threshold. Our CliqueMining achieves a better sorting of the retrievals with respect to their geographical distance to the query, which positively impacts the recall.
  • Figure 3: Recall@K vs. decision threshold on MSLS Train (val) and Nordland for DINOv2-SALAD izquierdo2024optimal without CliqueMining. Observe how the steep curve around the decision threshold (green dashed line) indicates a significant number of closely retrieved images. Boosting the GDS of a model would alleviate this, increasing its recall.
  • Figure 4: Overview of CliqueMining. First, we create a graph of candidates by sampling a set of sequences $\{s_1, \dots, s_S\}$ that are similar to a reference one $s_{ref}$ (left). We then sample places by finding cliques within the graph (center). Observe that the resulting batches contain very similar looking places, which boost the GDS (right).
  • Figure 5: Mean $\pm$ standard deviation of descriptor distances against geographic distances, without and with CliqueMining. Our Clique Mining boosts the geographic local sensitivity for small geographic distances, and flattens it for large distances. This results in higher discriminativity around the decision threshold and better metrics. Note the cut in distances and values for high distances aggregated at the right part.
  • ...and 2 more figures