NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields
Yanyu Zhang, Dongming Wang, Jie Xu, Mengyuan Liu, Pengxiang Zhu, Wei Ren
TL;DR
The paper tackles drift and relocalization challenges in map-based visual-inertial odometry for AR by introducing NeRF-VIO, which uses a pose-initialization MLP to relocalize the first frame within a pre-trained NeRF map and a two-stage update in an MSCKF framework that fuses both captured and NeRF-rendered images. A left-invariant SE(3) geodesic loss and a left-invariant metric on $\mathfrak{se}(3)$ ensure robust initialization across frame changes, while grid-based SSIM mitigates environmental alterations. Experiments on a real AR table dataset show that NeRF-VIO achieves superior initialization accuracy and latency compared to iNeRF and outperforms MSCKF in VIO accuracy, even under significant scene changes. The approach demonstrates practical value for robust, real-time AR localization using a pre-built NeRF map and online NeRF rendering.
Abstract
A prior map serves as a foundational reference for localization in context-aware applications such as augmented reality (AR). Providing valuable contextual information about the environment, the prior map is a vital tool for mitigating drift. In this paper, we propose a map-based visual-inertial localization algorithm (NeRF-VIO) with initialization using neural radiance fields (NeRF). Our algorithm utilizes a multilayer perceptron model and redefines the loss function as the geodesic distance on \(SE(3)\), ensuring the invariance of the initialization model under a frame change within \(\mathfrak{se}(3)\). The evaluation demonstrates that our model outperforms existing NeRF-based initialization solution in both accuracy and efficiency. By integrating a two-stage update mechanism within a multi-state constraint Kalman filter (MSCKF) framework, the state of NeRF-VIO is constrained by both captured images from an onboard camera and rendered images from a pre-trained NeRF model. The proposed algorithm is validated using a real-world AR dataset, the results indicate that our two-stage update pipeline outperforms MSCKF across all data sequences.
