NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented Camera Pose Regressor and Uncertainty Quantification
Juyeop Han, Lukas Lao Beyer, Guilherme V. Cavalheiro, Sertac Karaman
TL;DR
This work addresses drift and reliability issues in visual-inertial navigation by fusing a NeRF-derived localization signal with a CNN-based absolute pose regressor that outputs both pose and uncertainty. It establishes a Bayesian MAP framework for integrating uncertain pose estimates into a VIO factor-graph, leveraging either deep ensembles or MC dropout to quantify epistemic and aleatoric uncertainties. The offline-then-online pipeline trains NeRF on a small dataset, uses it to render augmented views for pose-regressor training, and employs uncertainty-aware residuals and outlier rejection in the MAP optimization. Experimental validation in a photorealistic FlightGoggles environment shows notable improvements in pose accuracy and drift mitigation, with ensemble-based uncertainty handling delivering the strongest performance. The results demonstrate a practical path toward robust, real-time navigation on embedded hardware, with future work aimed at scaling to larger spaces and handling time-varying NeRF representations.
Abstract
In recent years, Neural Radiance Fields (NeRF) have emerged as a powerful tool for 3D reconstruction and novel view synthesis. However, the computational cost of NeRF rendering and degradation in quality due to the presence of artifacts pose significant challenges for its application in real-time and robust robotic tasks, especially on embedded systems. This paper introduces a novel framework that integrates NeRF-derived localization information with Visual-Inertial Odometry (VIO) to provide a robust solution for real-time robotic navigation. By training an absolute pose regression network with augmented image data rendered from a NeRF and quantifying its uncertainty, our approach effectively counters positional drift and enhances system reliability. We also establish a mathematically sound foundation for combining visual inertial navigation with camera localization neural networks, considering uncertainty under a Bayesian framework. Experimental validation in a photorealistic simulation environment demonstrates significant improvements in accuracy compared to a conventional VIO approach.
