Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
Sergio Izquierdo, Javier Civera
TL;DR
This work identifies Geographic Distance Sensitivity (GDS) as a key weakness in modern VPR embeddings, where close geographic distances are not consistently reflected in descriptor distances, leading to incorrect top-k rankings. It introduces CliqueMining, a graph-based batch mining strategy that constructs dense batches of visually similar places within small geographic distances by sampling cliques in a distance-threshold graph, and integrates it with the Multi-Similarity loss to explicitly boost GDS. The approach yields substantial recall gains on dense benchmarks like Nordland (recall@1 exceeding 90%) and MSLS Challenge (notably +7.7% at recall@1), demonstrating that targeted, hard-batch mining can outperform more traditional two-stage re-ranking, while adding only modest training overhead. The method is particularly effective in densely sampled data, highlighting its potential to improve real-world VPR systems used in urban mapping and autonomous navigation, though its benefits depend on the presence of GDS issues and may be less impactful on datasets with limited viewpoint diversity.
Abstract
Visual Place Recognition (VPR) plays a critical role in many localization and mapping pipelines. It consists of retrieving the closest sample to a query image, in a certain embedding space, from a database of geotagged references. The image embedding is learned to effectively describe a place despite variations in visual appearance, viewpoint, and geometric changes. In this work, we formulate how limitations in the Geographic Distance Sensitivity of current VPR embeddings result in a high probability of incorrectly sorting the top-k retrievals, negatively impacting the recall. In order to address this issue in single-stage VPR, we propose a novel mining strategy, CliqueMining, that selects positive and negative examples by sampling cliques from a graph of visually similar images. Our approach boosts the sensitivity of VPR embeddings at small distance ranges, significantly improving the state of the art on relevant benchmarks. In particular, we raise recall@1 from 75% to 82% in MSLS Challenge, and from 76% to 90% in Nordland. Models and code are available at https://github.com/serizba/cliquemining.
