Table of Contents
Fetching ...

SLC$^2$-SLAM: Semantic-guided Loop Closure using Shared Latent Code for NeRF SLAM

Yuhang Ming, Di Ma, Weichen Dai, Han Yang, Rui Fan, Guofeng Zhang, Wanzeng Kong

TL;DR

SLC$^2$-SLAM addresses cumulative drift in NeRF-SLAM by repurposing on-the-fly latent codes as local features for loop detection, guided by semantic information decoded from the same codes. A semantic-guided stratified sampling strategy, coupled with VLAD-based global descriptors, enables robust loop retrieval, while pose-graph optimization and bundle adjustment refine poses and the neural map. The approach outperforms NetVLAD and ORB+BoW baselines and achieves higher recall in loop detection, along with improved tracking and reconstruction on Replica and ScanNet, especially in large scenes. This work demonstrates that latent NeRF representations can serve dual roles in reconstruction and semantic-based loop closure, delivering practical improvements for dense, persistent 3D mapping.

Abstract

Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure using Shared Latent Code, dubbed SLC$^2$-SLAM. We argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local features. To further improve the loop detection performance, we use the semantic information, which are also decoded from the same latent codes to guide the aggregation of local features. Finally, with the potential loops detected, we close them with a graph optimization followed by bundle adjustment to refine both the estimated poses and the reconstructed scene. To evaluate the performance of our SLC$^2$-SLAM, we conduct extensive experiments on Replica and ScanNet datasets. Our proposed semantic-guided loop closure significantly outperforms the pre-trained NetVLAD and ORB combined with Bag-of-Words, which are used in all the other NeRF SLAM with loop closure. As a result, our SLC$^2$-SLAM also demonstrated better tracking and reconstruction performance, especially in larger scenes with more loops, like ScanNet.

SLC$^2$-SLAM: Semantic-guided Loop Closure using Shared Latent Code for NeRF SLAM

TL;DR

SLC-SLAM addresses cumulative drift in NeRF-SLAM by repurposing on-the-fly latent codes as local features for loop detection, guided by semantic information decoded from the same codes. A semantic-guided stratified sampling strategy, coupled with VLAD-based global descriptors, enables robust loop retrieval, while pose-graph optimization and bundle adjustment refine poses and the neural map. The approach outperforms NetVLAD and ORB+BoW baselines and achieves higher recall in loop detection, along with improved tracking and reconstruction on Replica and ScanNet, especially in large scenes. This work demonstrates that latent NeRF representations can serve dual roles in reconstruction and semantic-based loop closure, delivering practical improvements for dense, persistent 3D mapping.

Abstract

Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure using Shared Latent Code, dubbed SLC-SLAM. We argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local features. To further improve the loop detection performance, we use the semantic information, which are also decoded from the same latent codes to guide the aggregation of local features. Finally, with the potential loops detected, we close them with a graph optimization followed by bundle adjustment to refine both the estimated poses and the reconstructed scene. To evaluate the performance of our SLC-SLAM, we conduct extensive experiments on Replica and ScanNet datasets. Our proposed semantic-guided loop closure significantly outperforms the pre-trained NetVLAD and ORB combined with Bag-of-Words, which are used in all the other NeRF SLAM with loop closure. As a result, our SLC-SLAM also demonstrated better tracking and reconstruction performance, especially in larger scenes with more loops, like ScanNet.
Paper Structure (13 sections, 4 equations, 5 figures, 5 tables)

This paper contains 13 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Tracking and reconstruction results on the scene0054 of ScanNet scannet. With semantic-guided loop closure, our SLC$^2$-SLAM achieved better tracking and reconstruction performance. In contrast, our base system Co-SLAM coslam exhibits obvious misalignment, especially evident in the areas in the pink bounding boxes.
  • Figure 2: System Overview. Our proposed SLC$^2$-SLAM consists of four main components. At its core, there is a hybrid scene representation with latent code voxel hashing and three MLPs. Then, we have tracking module, mapping module, and semantic-guided loop closure module that interact with the hybrid scene representation to perform tracking, mapping, and loop closure.
  • Figure 3: Examples of semantic-guided stratified sampling (S.G.S.S) versus random sampling (R.S.).
  • Figure 4: Semantic segmentation examples on ScanNet.
  • Figure 5: Reconstruction examples of Co-SLAM coslam, Loopy-SLAM loopyslam, and our SLC$^2$-SLAM on the ScanNet and Replica datasets. Compared to Loopy-SLAM, our reconstructions are more complete for Replica scenes and better aligned and less noisy for ScanNet scenes. Compared to Co-SLAM, ours are more complete and less noisy for both datasets. Zoomed in views are provided with highlights for better visualization.