Distillation Improves Visual Place Recognition for Low Quality Images
Anbang Yang, Ge Jin, Junjie Huang, Yao Wang, John-Ross Rizzo, Chen Feng
TL;DR
The paper tackles the problem of visual place recognition (VPR) performance degradation when query images are transmitted at low quality due to bandwidth constraints. It introduces a model-agnostic, two-branch knowledge-distillation framework that transfers information from a teacher operating on high-quality images to a student processing low-quality inputs, using Inter-Channel Correlation Knowledge Distillation (ICKD), Mean Squared Error (MSE), and a Weakly Supervised Triplet Ranking Loss, with a composite objective $L = L_{ heta 1} + \alpha L_{ heta 2} + \beta L_{ heta 3}$. The authors validate the approach across multiple VPR methods and datasets under JPEG compression, resolution reduction, and video quantization, and also curate a video-based VPR dataset to address data scarcity. Results show significant recall improvements in most settings, demonstrating the method's generalization to different degradations and modalities, and highlighting a practical path for reliable VPR in resource-constrained environments. This work advances VPR by combining distillation with multi-loss supervision to preserve discriminative descriptor structure when input quality is compromised.
Abstract
Real-time visual localization often utilizes online computing, for which query images or videos are transmitted to remote servers for visual place recognition (VPR). However, limited network bandwidth necessitates image-quality reduction and thus the degradation of global image descriptors, reducing VPR accuracy. We address this issue at the descriptor extraction level with a knowledge-distillation methodology that learns feature representations from high-quality images to extract more discriminative descriptors from low-quality images. Our approach includes the Inter-channel Correlation Knowledge Distillation (ICKD) loss, Mean Squared Error (MSE) loss, and Triplet loss. We validate the proposed losses on multiple VPR methods and datasets subjected to JPEG compression, resolution reduction, and video quantization. We obtain significant improvements in VPR recall rates under all three tested modalities of lowered image quality. Furthermore, we fill a gap in VPR literature on video-based data and its influence on VPR performance. This work contributes to more reliable place recognition in resource-constrained environments.
