CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu, Xiangyuan Lan, Lijun Zhang, Dongmei Jiang, Yaowei Wang, Chun Yuan
TL;DR
CricaVPR addresses robust visual place recognition by introducing cross-image correlation-aware representation learning. It uses a cross-image encoder to propagate information across all images in a batch, enabling condition- and viewpoint-invariant global descriptors, and employs a MulConv adapter for parameter-efficient, multi-scale adaptation of a pre-trained backbone. The approach achieves state-of-the-art results across major VPR benchmarks (e.g., Pitts30k, MSLS, Tokyo24/7, Nordland, SVOX) with significantly reduced training time and parameter overhead. This work demonstrates the value of cross-image cues and lightweight foundation-model adaptation for robust VPR in diverse and challenging environments.
Abstract
Over the past decade, most methods in visual place recognition (VPR) have used neural networks to produce feature representations. These networks typically produce a global representation of a place image using only this image itself and neglect the cross-image variations (e.g. viewpoint and illumination), which limits their robustness in challenging scenes. In this paper, we propose a robust global representation method with cross-image correlation awareness for VPR, named CricaVPR. Our method uses the attention mechanism to correlate multiple images within a batch. These images can be taken in the same place with different conditions or viewpoints, or even captured from different places. Therefore, our method can utilize the cross-image variations as a cue to guide the representation learning, which ensures more robust features are produced. To further facilitate the robustness, we propose a multi-scale convolution-enhanced adaptation method to adapt pre-trained visual foundation models to the VPR task, which introduces the multi-scale local information to further enhance the cross-image correlation-aware representation. Experimental results show that our method outperforms state-of-the-art methods by a large margin with significantly less training time. The code is released at https://github.com/Lu-Feng/CricaVPR.
