TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation
Yehui Shen, Mingmin Liu, Huimin Lu, Xieyuanli Chen
TL;DR
The paper tackles robust visual place recognition under environmental variations by introducing TSCM, a teacher-student framework that uses cross-metric knowledge distillation to bridge the gap between a high-capacity teacher and a lightweight student. The approach integrates a ResNet-ViT-InterTransformer teacher with a compact ResNet-based student and a novel cross-metric loss $L_{ ext{total}} = L_{ ext{hard}} + L_{ ext{soft}} + L_{ ext{cm}}$, where $L_{ ext{cm}} = \sum_i d(S(a_i) - T(p_i)) + d(S(p_i) - T(a_i))$, to enforce cross-model descriptor relationships. Experiments on Pittsburgh30k and Pittsburgh250k demonstrate that the student not only approaches but can exceed the teacher's VPR accuracy while offering substantially reduced parameters and faster inference, achieving descriptor generation in about $1.3$ ms and matching in under $0.6$ ms per query for a 10k database. The work shows strong ablations confirming the efficacy of cross-metric KD over traditional KD methods and underscores its potential for real-time robotic navigation on resource-constrained platforms. The code is released to facilitate adoption and reproducibility.
Abstract
Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large networks, leading to significant consumption of computational resources. In this paper, we propose a high-performance teacher and lightweight student distillation framework called TSCM. It exploits our devised cross-metric knowledge distillation to narrow the performance gap between the teacher and student models, maintaining superior performance while enabling minimal computational load during deployment. We conduct comprehensive evaluations on large-scale datasets, namely Pittsburgh30k and Pittsburgh250k. Experimental results demonstrate the superiority of our method over baseline models in terms of recognition accuracy and model parameter efficiency. Moreover, our ablation studies show that the proposed knowledge distillation technique surpasses other counterparts. The code of our method has been released at https://github.com/nubot-nudt/TSCM.
