Table of Contents
Fetching ...

VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Continual Learning

Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong

TL;DR

VIPeR tackles visual place recognition under continual learning, addressing performance drops in unseen environments. It combines adaptive mining for metric learning, a hierarchical multi-stage memory bank for rehearsal, RMAS-based regularization, and probabilistic knowledge distillation to preserve past knowledge while learning new environments. Extensive experiments on Oxford RobotCar, Nordland, and TartanAir show VIPeR outperforms baselines and existing continual learning approaches, with strong generalization across datasets and backbones. The work advances practical continual VPR by enabling robust cross-environment recognition and resilience to forgetting in real-world deployments.

Abstract

Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive performance at the cost of heavy pre-training and limited generalizability. When deployed in unseen environments, these methods exhibit significant performance drops. Targeting this issue, we present VIPeR, a novel approach for visual incremental place recognition with the ability to adapt to new environments while retaining the performance of previous environments. We first introduce an adaptive mining strategy that balances the performance within a single environment and the generalizability across multiple environments. Then, to prevent catastrophic forgetting in lifelong learning, we draw inspiration from human memory systems and design a novel memory bank for our VIPeR. Our memory bank contains a sensory memory, a working memory and a long-term memory, with the first two focusing on the current environment and the last one for all previously visited environments. Additionally, we propose a probabilistic knowledge distillation to explicitly safeguard the previously learned knowledge. We evaluate our proposed VIPeR on three large-scale datasets, namely Oxford Robotcar, Nordland, and TartanAir. For comparison, we first set a baseline performance with naive finetuning. Then, several more recent lifelong learning methods are compared. Our VIPeR achieves better performance in almost all aspects with the biggest improvement of 13.65% in average performance.

VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Continual Learning

TL;DR

VIPeR tackles visual place recognition under continual learning, addressing performance drops in unseen environments. It combines adaptive mining for metric learning, a hierarchical multi-stage memory bank for rehearsal, RMAS-based regularization, and probabilistic knowledge distillation to preserve past knowledge while learning new environments. Extensive experiments on Oxford RobotCar, Nordland, and TartanAir show VIPeR outperforms baselines and existing continual learning approaches, with strong generalization across datasets and backbones. The work advances practical continual VPR by enabling robust cross-environment recognition and resilience to forgetting in real-world deployments.

Abstract

Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive performance at the cost of heavy pre-training and limited generalizability. When deployed in unseen environments, these methods exhibit significant performance drops. Targeting this issue, we present VIPeR, a novel approach for visual incremental place recognition with the ability to adapt to new environments while retaining the performance of previous environments. We first introduce an adaptive mining strategy that balances the performance within a single environment and the generalizability across multiple environments. Then, to prevent catastrophic forgetting in lifelong learning, we draw inspiration from human memory systems and design a novel memory bank for our VIPeR. Our memory bank contains a sensory memory, a working memory and a long-term memory, with the first two focusing on the current environment and the last one for all previously visited environments. Additionally, we propose a probabilistic knowledge distillation to explicitly safeguard the previously learned knowledge. We evaluate our proposed VIPeR on three large-scale datasets, namely Oxford Robotcar, Nordland, and TartanAir. For comparison, we first set a baseline performance with naive finetuning. Then, several more recent lifelong learning methods are compared. Our VIPeR achieves better performance in almost all aspects with the biggest improvement of 13.65% in average performance.
Paper Structure (16 sections, 6 equations, 4 figures, 3 tables)

This paper contains 16 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Visual Incremental Place Recognition (VIPeR). Within a single environment, VIPeR aims to bring descriptors from the same place closer together and push those from different places farther apart in a learned feature space. When adapting to multiple environments, VIPeR is geared towards acquiring knowledge about new places while retaining information about previously encountered ones.
  • Figure 2: Overview of main components and data flow within our proposed VIPeR. To alleviate the catastrophic forgetting in continual learning, our proposed VIPeR accompanies the place recognition model with a multi-stage memory bank for rehearsal, and adaptive mining, relational memory aware synapses (RMAS) and probabilistic knowledge distillation (PKD) for regularization.
  • Figure 3: Top-$1$ retrieval of the model trained for $T$ environments and evaluated on the first environment. Our VIPeR can better discriminate similar places and exhibits better resilience to catastrophic forgetting. The numbers on the image indicate the distance from the query image; the timestamp distance$\downarrow$ is measured in Nordland, the Euclidean distance$\downarrow$ in RobotCar, and the sIoU$\uparrow$ in TartanAir.
  • Figure 4: Performance evaluated in the first environment after training in all environments with different mining strategies, highlighting forgetfulness. "hard5", "hard10" indicate the performance of using the hard mining strategy with 5, and 10 negative samples.