Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression
Huy-Hoang Bui, Bach-Thuan Bui, Dinh-Tuan Tran, Joo-Ho Lee
TL;DR
The paper tackles data-scarce visual localization by augmenting descriptor-based KSCR (D2S) with a NeRF-driven data synthesis pipeline. It trains a Nerfacto NeRF, synthesizes novel views through pose interpolation and view rendering, and uses robust feature matching to integrate synthetic data into KSCR training. Empirical results on 7Scenes and 12Scenes show improved translation and rotation accuracy, outperforming several SCR and few-shot baselines while requiring fewer real-world images. The approach is modular and scalable, with potential for incorporating multiple NeRFs, though outdoor and dynamic environments remain a challenge for NeRF-based rendering.
Abstract
Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: https://github.com/ais-lab/DescriptorSynthesis4Feat2Map.
