SP-SLAM: Neural Real-Time Dense SLAM With Scene Priors
Zhen Hong, Bowen Wang, Haoran Duan, Yawen Huang, Xiong Li, Zhenyu Wen, Xiang Wu, Wei Xiang, Yefeng Zheng
TL;DR
SP-SLAM tackles real-time dense SLAM by injecting scene priors into a neural implicit framework. It encodes depth-derived priors into a sparse voxel volume and stores appearance on tri-planes, enabling rapid convergence and high-fidelity geometry and texture without relying on keyframes. A pixel-database driven optimization enables continuous refinement of all frame poses during mapping, achieving accurate tracking with fewer iterations and real-time performance. Across five benchmark datasets, SP-SLAM demonstrates superior tracking accuracy, reconstruction quality, and significantly faster speed than existing neural SLAM methods, highlighting its practical value for real-time robotics and AR/VR applications.
Abstract
Neural implicit representations have recently shown promising progress in dense Simultaneous Localization And Mapping (SLAM). However, existing works have shortcomings in terms of reconstruction quality and real-time performance, mainly due to inflexible scene representation strategy without leveraging any prior information. In this paper, we introduce SP-SLAM, a novel neural RGB-D SLAM system that performs tracking and mapping in real-time. SP-SLAM computes depth images and establishes sparse voxel-encoded scene priors near the surfaces to achieve rapid convergence of the model. Subsequently, the encoding voxels computed from single-frame depth image are fused into a global volume, which facilitates high-fidelity surface reconstruction. Simultaneously, we employ tri-planes to store scene appearance information, striking a balance between achieving high-quality geometric texture mapping and minimizing memory consumption. Furthermore, in SP-SLAM, we introduce an effective optimization strategy for mapping, allowing the system to continuously optimize the poses of all historical input frames during runtime without increasing computational overhead. We conduct extensive evaluations on five benchmark datasets (Replica, ScanNet, TUM RGB-D, Synthetic RGB-D, 7-Scenes). The results demonstrate that, compared to existing methods, we achieve superior tracking accuracy and reconstruction quality, while running at a significantly faster speed.
