TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition
Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan
TL;DR
The paper addresses the resource bottleneck of transformer-based visual place recognition for real-time SLAM loop closure on low-power platforms. It introduces TAT-VPR, a ternary-quantized ViT backbone with an adaptive activation-sparsity gate and a two-stage distillation pipeline from a full-precision BoQ teacher, followed by targeted fine-tuning for retrieval. The approach achieves dynamic inference-cost control, delivering up to 40% TOps savings and about 5× memory reduction while maintaining near state-of-the-art Recall@1 on standard VPR benchmarks. The results demonstrate robust performance under appearance changes and are suitable for micro-UAV and embedded SLAM stacks, providing a practical, adaptive VPR solution.
Abstract
TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.
