GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM
Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald
TL;DR
GlORIE-SLAM addresses the challenge of RGB-only dense SLAM by introducing a deformable neural point cloud map and a DSPO layer that fuses monocular depth priors into two-stage optimization for pose, disparity, and depth scale. The system performs online loop closure and global bundle adjustment to maintain global map consistency without retraining neural grids, and it renders via depth-guided volume rendering using a proxy depth map. Empirically, GlORIE-SLAM achieves state-of-the-art or competitive rendering, reconstruction, and tracking on Replica, TUM-RGBD, and ScanNet datasets, while maintaining reasonable memory and runtime. This approach offers a scalable, RGB-only solution with robust global consistency and high-fidelity rendering suitable for real-world indoor environments.
Abstract
Recent advancements in RGB-only dense Simultaneous Localization and Mapping (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we propose an efficient RGB-only dense SLAM system using a flexible neural point cloud scene representation that adapts to keyframe poses and depth updates, without needing costly backpropagation. Another critical challenge of RGB-only SLAM is the lack of geometric priors. To alleviate this issue, with the aid of a monocular depth estimator, we introduce a novel DSPO layer for bundle adjustment which optimizes the pose and depth of keyframes along with the scale of the monocular depth. Finally, our system benefits from loop closure and online global bundle adjustment and performs either better or competitive to existing dense neural RGB SLAM methods in tracking, mapping and rendering accuracy on the Replica, TUM-RGBD and ScanNet datasets. The source code is available at https://github.com/zhangganlin/GlOIRE-SLAM
