NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields
Antoni Rosinol, John J. Leonard, Luca Carlone
TL;DR
This work tackles monocular real-time 3D reconstruction by fusing dense monocular SLAM with a probabilistic, hash-based neural radiance field (NeRF). It leverages poses, dense depths, and per-pixel depth/pose covariances from dense SLAM to weight depth supervision in a real-time NeRF training pipeline, achieving superior geometric and photometric accuracy without input depth or pose data. The method demonstrates state-of-the-art results on Replica compared to TSDF-based methods and recent NeRF-SLAM approaches, while maintaining real-time performance. Limitations include high GPU memory usage, with proposed mitigations and future directions toward metric-semantic SLAM and dynamic scene understanding.
Abstract
We propose a novel geometric and photometric 3D mapping pipeline for accurate and real-time scene reconstruction from monocular images. To achieve this, we leverage recent advances in dense monocular SLAM and real-time hierarchical volumetric neural radiance fields. Our insight is that dense monocular SLAM provides the right information to fit a neural radiance field of the scene in real-time, by providing accurate pose estimates and depth-maps with associated uncertainty. With our proposed uncertainty-based depth loss, we achieve not only good photometric accuracy, but also great geometric accuracy. In fact, our proposed pipeline achieves better geometric and photometric accuracy than competing approaches (up to 179% better PSNR and 86% better L1 depth), while working in real-time and using only monocular images.
