The Overlooked Value of Test-time Reference Sets in Visual Place Recognition
Mubariz Zaffar, Liangliang Nan, Sebastian Scherer, Julian F. P. Kooij
TL;DR
This work addresses the train-test domain gap in Visual Place Recognition (VPR) by exploiting the test-time reference map, which contains target-domain images and poses. It introduces Reference-Set-Finetuning (RSF), a simple self-supervised strategy that fine-tunes a VPR model on a finetuning dataset D_ft constructed from the map using augmentations and pose-aware triplet mining, with the loss L_triplet to optimize the embedding space. RSF does not require new data or backbone changes and yields notable improvements in Recall@1 (average ~2.3%) on challenging datasets, while preserving generalization across other test sets and benefiting multiple SOTA backbones/aggregators (e.g., BoQ, SALAD). The approach demonstrates that test-time maps are a practical and effective domain adaptation resource for VPR, with broad applicability and potential for further enhancement through augmentation strategies and formulation variants.
Abstract
Given a query image, Visual Place Recognition (VPR) is the task of retrieving an image of the same place from a reference database with robustness to viewpoint and appearance changes. Recent works show that some VPR benchmarks are solved by methods using Vision-Foundation-Model backbones and trained on large-scale and diverse VPR-specific datasets. Several benchmarks remain challenging, particularly when the test environments differ significantly from the usual VPR training datasets. We propose a complementary, unexplored source of information to bridge the train-test domain gap, which can further improve the performance of State-of-the-Art (SOTA) VPR methods on such challenging benchmarks. Concretely, we identify that the test-time reference set, the "map", contains images and poses of the target domain, and must be available before the test-time query is received in several VPR applications. Therefore, we propose to perform simple Reference-Set-Finetuning (RSF) of VPR models on the map, boosting the SOTA (~2.3% increase on average for Recall@1) on these challenging datasets. Finetuned models retain generalization, and RSF works across diverse test datasets.
