SLC$^2$-SLAM: Semantic-guided Loop Closure using Shared Latent Code for NeRF SLAM
Yuhang Ming, Di Ma, Weichen Dai, Han Yang, Rui Fan, Guofeng Zhang, Wanzeng Kong
TL;DR
SLC$^2$-SLAM addresses cumulative drift in NeRF-SLAM by repurposing on-the-fly latent codes as local features for loop detection, guided by semantic information decoded from the same codes. A semantic-guided stratified sampling strategy, coupled with VLAD-based global descriptors, enables robust loop retrieval, while pose-graph optimization and bundle adjustment refine poses and the neural map. The approach outperforms NetVLAD and ORB+BoW baselines and achieves higher recall in loop detection, along with improved tracking and reconstruction on Replica and ScanNet, especially in large scenes. This work demonstrates that latent NeRF representations can serve dual roles in reconstruction and semantic-based loop closure, delivering practical improvements for dense, persistent 3D mapping.
Abstract
Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure using Shared Latent Code, dubbed SLC$^2$-SLAM. We argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local features. To further improve the loop detection performance, we use the semantic information, which are also decoded from the same latent codes to guide the aggregation of local features. Finally, with the potential loops detected, we close them with a graph optimization followed by bundle adjustment to refine both the estimated poses and the reconstructed scene. To evaluate the performance of our SLC$^2$-SLAM, we conduct extensive experiments on Replica and ScanNet datasets. Our proposed semantic-guided loop closure significantly outperforms the pre-trained NetVLAD and ORB combined with Bag-of-Words, which are used in all the other NeRF SLAM with loop closure. As a result, our SLC$^2$-SLAM also demonstrated better tracking and reconstruction performance, especially in larger scenes with more loops, like ScanNet.
