Table of Contents
Fetching ...

NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented Camera Pose Regressor and Uncertainty Quantification

Juyeop Han, Lukas Lao Beyer, Guilherme V. Cavalheiro, Sertac Karaman

TL;DR

This work addresses drift and reliability issues in visual-inertial navigation by fusing a NeRF-derived localization signal with a CNN-based absolute pose regressor that outputs both pose and uncertainty. It establishes a Bayesian MAP framework for integrating uncertain pose estimates into a VIO factor-graph, leveraging either deep ensembles or MC dropout to quantify epistemic and aleatoric uncertainties. The offline-then-online pipeline trains NeRF on a small dataset, uses it to render augmented views for pose-regressor training, and employs uncertainty-aware residuals and outlier rejection in the MAP optimization. Experimental validation in a photorealistic FlightGoggles environment shows notable improvements in pose accuracy and drift mitigation, with ensemble-based uncertainty handling delivering the strongest performance. The results demonstrate a practical path toward robust, real-time navigation on embedded hardware, with future work aimed at scaling to larger spaces and handling time-varying NeRF representations.

Abstract

In recent years, Neural Radiance Fields (NeRF) have emerged as a powerful tool for 3D reconstruction and novel view synthesis. However, the computational cost of NeRF rendering and degradation in quality due to the presence of artifacts pose significant challenges for its application in real-time and robust robotic tasks, especially on embedded systems. This paper introduces a novel framework that integrates NeRF-derived localization information with Visual-Inertial Odometry (VIO) to provide a robust solution for real-time robotic navigation. By training an absolute pose regression network with augmented image data rendered from a NeRF and quantifying its uncertainty, our approach effectively counters positional drift and enhances system reliability. We also establish a mathematically sound foundation for combining visual inertial navigation with camera localization neural networks, considering uncertainty under a Bayesian framework. Experimental validation in a photorealistic simulation environment demonstrates significant improvements in accuracy compared to a conventional VIO approach.

NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented Camera Pose Regressor and Uncertainty Quantification

TL;DR

This work addresses drift and reliability issues in visual-inertial navigation by fusing a NeRF-derived localization signal with a CNN-based absolute pose regressor that outputs both pose and uncertainty. It establishes a Bayesian MAP framework for integrating uncertain pose estimates into a VIO factor-graph, leveraging either deep ensembles or MC dropout to quantify epistemic and aleatoric uncertainties. The offline-then-online pipeline trains NeRF on a small dataset, uses it to render augmented views for pose-regressor training, and employs uncertainty-aware residuals and outlier rejection in the MAP optimization. Experimental validation in a photorealistic FlightGoggles environment shows notable improvements in pose accuracy and drift mitigation, with ensemble-based uncertainty handling delivering the strongest performance. The results demonstrate a practical path toward robust, real-time navigation on embedded hardware, with future work aimed at scaling to larger spaces and handling time-varying NeRF representations.

Abstract

In recent years, Neural Radiance Fields (NeRF) have emerged as a powerful tool for 3D reconstruction and novel view synthesis. However, the computational cost of NeRF rendering and degradation in quality due to the presence of artifacts pose significant challenges for its application in real-time and robust robotic tasks, especially on embedded systems. This paper introduces a novel framework that integrates NeRF-derived localization information with Visual-Inertial Odometry (VIO) to provide a robust solution for real-time robotic navigation. By training an absolute pose regression network with augmented image data rendered from a NeRF and quantifying its uncertainty, our approach effectively counters positional drift and enhances system reliability. We also establish a mathematically sound foundation for combining visual inertial navigation with camera localization neural networks, considering uncertainty under a Bayesian framework. Experimental validation in a photorealistic simulation environment demonstrates significant improvements in accuracy compared to a conventional VIO approach.
Paper Structure (13 sections, 16 equations, 5 figures, 2 tables)

This paper contains 13 sections, 16 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Outline of the proposed VIO framework, NVINS: (a) In the offline phase, a NeRF is trained using a dataset gathered in the target environment. Novel views synthesized using this NeRF are used to train the camera pose regressor. (b) A pose predicted by the trained camera pose regressor, along with its associated uncertainty, is integrated with other sensor measurements via pose graph optimization.
  • Figure 2: (a) Images captured in the Flightgoggles simulation environment Guerra2019Flightgoggles. (b) Images rendered through nerfstudio nerfstudio. The images exhibit significant differences from the originals, primarily due to artifacts caused by the sparse density of training data.
  • Figure 3: Positional and rotational errors of various camera pose regressors trained with different loss functions across entire trajectories (m/degree)
  • Figure 4: Comparison of ground truth trajectories, 'Trajectory 1, 3, 5, and 7', (blue) with estimations from the VIO framework without the camera pose regressor (orange), and with the ensemble camera pose regressor augmented by uncertainty quantification and outlier rejection (green)
  • Figure 5: Absolute Trajectory Error (ATE) for VIO with and without camera pose regressors over the travel distance of 'Trajectory 1'. 'd' (resp. 'e') denotes 'dropout' (resp. 'ensemble'), '+u' (resp. '+r') indicates uncertainty prediction (resp. outlier rejection) is applied, and '-' signifies the absence of the respective feature.