NeRF-VINS: A Real-time Neural Radiance Field Map-based Visual-Inertial Navigation System
Saimouli Katragadda, Woosik Lee, Yuxiang Peng, Patrick Geneva, Chuchu Chen, Chao Guo, Mingyang Li, Guoquan Huang
TL;DR
The paper addresses drift in map-based localization by introducing NeRF-VINS, a real-time, tightly-coupled visual–inertial system that fuses an a priori NeRF map with IMU and monocular imagery through an MSCKF-based filter. It leverages NeRF-rendered novel views to obtain informative measurements, enabling drift-free, centimeter-level pose estimates at over 10 Hz on edge hardware like the Jetson AGX Orin. The approach includes offline NeRF map generation, careful descriptor selection (favoring SuperPoint), and a rendering pipeline that balances speed and fidelity via half-resolution renders and FSRCNN upsampling. Experiments on the AR Table dataset show NeRF-VINS outperforms traditional map-based methods and many baselines in accuracy and robustness, while maintaining real-time performance and resilience to environmental changes.
Abstract
Achieving efficient and consistent localization a prior map remains challenging in robotics. Conventional keyframe-based approaches often suffers from sub-optimal viewpoints due to limited field of view (FOV) and/or constrained motion, thus degrading the localization performance. To address this issue, we design a real-time tightly-coupled Neural Radiance Fields (NeRF)-aided visual-inertial navigation system (VINS). In particular, by effectively leveraging the NeRF's potential to synthesize novel views, the proposed NeRF-VINS overcomes the limitations of traditional keyframe-based maps (with limited views) and optimally fuses IMU, monocular images, and synthetically rendered images within an efficient filter-based framework. This tightly-coupled fusion enables efficient 3D motion tracking with bounded errors. We extensively compare the proposed NeRF-VINS against the state-of-the-art methods that use prior map information and demonstrate its ability to perform real-time localization, at over 10 Hz, on a resource-constrained Jetson AGX Orin embedded platform.
