Table of Contents
Fetching ...

UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation

Yehui Shen, Lei Zhang, Qingqiu Li, Xiongwei Zhao, Yue Wang, Huimin Lu, Xieyuanli Chen

TL;DR

The paper tackles the challenge of robust visual place recognition under limited viewpoints by introducing UGNA-VPR, a training paradigm that uses uncertainty-guided NeRF augmentation to generate informative synthetic views from existing data. A self-supervised uncertainty estimation network identifies high-uncertainty candidate poses near VPR failure locations, enabling selective NeRF rendering to enrich VPR training; a data organization strategy maximizes the utility of real and synthetic samples. Extensive experiments across three public/self-recorded datasets and three VPR backbones show consistent improvements in Recall@1, with notable gains in hard scenarios and with NeRF-H rendering. The approach preserves existing VPR architectures and avoids additional data collection, offering a practical, scalable path to improved multi-view VPR in indoor and outdoor environments. The authors release their dataset and code to support reproducibility and further research.

Abstract

Visual place recognition (VPR) is crucial for robots to identify previously visited locations, playing an important role in autonomous navigation in both indoor and outdoor environments. However, most existing VPR datasets are limited to single-viewpoint scenarios, leading to reduced recognition accuracy, particularly in multi-directional driving or feature-sparse scenes. Moreover, obtaining additional data to mitigate these limitations is often expensive. This paper introduces a novel training paradigm to improve the performance of existing VPR networks by enhancing multi-view diversity within current datasets through uncertainty estimation and NeRF-based data augmentation. Specifically, we initially train NeRF using the existing VPR dataset. Then, our devised self-supervised uncertainty estimation network identifies places with high uncertainty. The poses of these uncertain places are input into NeRF to generate new synthetic observations for further training of VPR networks. Additionally, we propose an improved storage method for efficient organization of augmented and original training data. We conducted extensive experiments on three datasets and tested three different VPR backbone networks. The results demonstrate that our proposed training paradigm significantly improves VPR performance by fully utilizing existing data, outperforming other training approaches. We further validated the effectiveness of our approach on self-recorded indoor and outdoor datasets, consistently demonstrating superior results. Our dataset and code have been released at \href{https://github.com/nubot-nudt/UGNA-VPR}{https://github.com/nubot-nudt/UGNA-VPR}.

UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation

TL;DR

The paper tackles the challenge of robust visual place recognition under limited viewpoints by introducing UGNA-VPR, a training paradigm that uses uncertainty-guided NeRF augmentation to generate informative synthetic views from existing data. A self-supervised uncertainty estimation network identifies high-uncertainty candidate poses near VPR failure locations, enabling selective NeRF rendering to enrich VPR training; a data organization strategy maximizes the utility of real and synthetic samples. Extensive experiments across three public/self-recorded datasets and three VPR backbones show consistent improvements in Recall@1, with notable gains in hard scenarios and with NeRF-H rendering. The approach preserves existing VPR architectures and avoids additional data collection, offering a practical, scalable path to improved multi-view VPR in indoor and outdoor environments. The authors release their dataset and code to support reproducibility and further research.

Abstract

Visual place recognition (VPR) is crucial for robots to identify previously visited locations, playing an important role in autonomous navigation in both indoor and outdoor environments. However, most existing VPR datasets are limited to single-viewpoint scenarios, leading to reduced recognition accuracy, particularly in multi-directional driving or feature-sparse scenes. Moreover, obtaining additional data to mitigate these limitations is often expensive. This paper introduces a novel training paradigm to improve the performance of existing VPR networks by enhancing multi-view diversity within current datasets through uncertainty estimation and NeRF-based data augmentation. Specifically, we initially train NeRF using the existing VPR dataset. Then, our devised self-supervised uncertainty estimation network identifies places with high uncertainty. The poses of these uncertain places are input into NeRF to generate new synthetic observations for further training of VPR networks. Additionally, we propose an improved storage method for efficient organization of augmented and original training data. We conducted extensive experiments on three datasets and tested three different VPR backbone networks. The results demonstrate that our proposed training paradigm significantly improves VPR performance by fully utilizing existing data, outperforming other training approaches. We further validated the effectiveness of our approach on self-recorded indoor and outdoor datasets, consistently demonstrating superior results. Our dataset and code have been released at \href{https://github.com/nubot-nudt/UGNA-VPR}{https://github.com/nubot-nudt/UGNA-VPR}.

Paper Structure

This paper contains 22 sections, 5 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of our novel VPR training paradigm, which consists of two main components: NeRF-based data augmentation and uncertainty estimation for selecting candidate rendering poses. We first use existing datasets to train both the NeRF and VPR networks. Based on the performance of the VPR network, a set of candidate rendering poses is generated. The uncertainty estimation network then assesses the uncertainty in descriptors of these candidate poses. The candidates with high uncertainties are selected and rendered by NeRF. The generated synthetic images are added to the VPR network training to enhance VPR performance.
  • Figure 2: The training pipeline for our uncertainty estimation (UE) network. The network is trained using real data and estimates the uncertainty of VPR descriptors generated by each candidate pose, which is used for the synthetic data selection.
  • Figure 3: Examples of uncertainty estimation. With the same reference images, our UE network estimates the uncertainty for two candidate poses. Brighter regions in the uncertainty descriptors indicate that the uncertainty in the descriptors for pose 2 is higher than that for pose 1. Consequently, the difference between the VPR descriptors generated by the uncertainty pose and those generated by the corresponding image from the VPR network is greater for pose 2.
  • Figure 4: Uncertainty estimation during VPR training. Using the same reference information, we employ an uncertainty estimation network to evaluate the uncertainty in descriptors of candidate poses and select those with high uncertainty to render for subsequent training.
  • Figure 5: Recall@N on different datasets with MixVPR. (a) Cambridge. (b) LIB. (c) CON.
  • ...and 2 more figures