Table of Contents
Fetching ...

NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments

Ziheng Xu, Jianwei Niu, Qingfeng Li, Tao Ren, Chen Chen

TL;DR

NID-SLAM tackles dynamic-object interruptions in RGB-D SLAM by combining depth-guided semantic mask refinement, depth-based object removal, and background inpainting with a neural implicit scene representation. It introduces a depth revision mechanism (thresholded depth gradients), depth-aware mask refinement, and a dynamic-scene–oriented keyframe strategy to improve tracking robustness and mapping completeness. The method uses multiresolution feature grids with ray-based rendering and joint optimization of geometry, color, and camera poses, achieving state-of-the-art tracking accuracy among neural SLAMs on dynamic datasets and producing higher-quality maps than baselines, albeit with speed limited by segmentation. This work advances practical neural SLAM in dynamic environments and enables reusable static maps, with potential for real-time improvements through faster segmentation and predictive inpainting.

Abstract

Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. Utilizing the geometric information present in depth images, this method enables accurate removal of dynamic objects, thereby reducing the probability of camera drift. Additionally, we introduce a keyframe selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects and improves the efficiency of mapping. Experiments on publicly available RGB-D datasets demonstrate that our method outperforms competitive neural SLAM approaches in tracking accuracy and mapping quality in dynamic environments.

NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments

TL;DR

NID-SLAM tackles dynamic-object interruptions in RGB-D SLAM by combining depth-guided semantic mask refinement, depth-based object removal, and background inpainting with a neural implicit scene representation. It introduces a depth revision mechanism (thresholded depth gradients), depth-aware mask refinement, and a dynamic-scene–oriented keyframe strategy to improve tracking robustness and mapping completeness. The method uses multiresolution feature grids with ray-based rendering and joint optimization of geometry, color, and camera poses, achieving state-of-the-art tracking accuracy among neural SLAMs on dynamic datasets and producing higher-quality maps than baselines, albeit with speed limited by segmentation. This work advances practical neural SLAM in dynamic environments and enables reusable static maps, with potential for real-time improvements through faster segmentation and predictive inpainting.

Abstract

Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. Utilizing the geometric information present in depth images, this method enables accurate removal of dynamic objects, thereby reducing the probability of camera drift. Additionally, we introduce a keyframe selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects and improves the efficiency of mapping. Experiments on publicly available RGB-D datasets demonstrate that our method outperforms competitive neural SLAM approaches in tracking accuracy and mapping quality in dynamic environments.
Paper Structure (12 sections, 6 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 12 sections, 6 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: The 3D reconstruction results of NID-SLAM on our self-captured large dynamic scene.
  • Figure 2: System overview. 1) Dynamic object removal: by employing semantic segmentation and mask revision, dynamic objects within RGB-D images are precisely eliminated, followed by a thorough restoration of the occluded backgrounds. 2) Tracking: camera poses $\{\mathbf{R}, \mathbf{t}\}$ are optimized by minimizing losses. 3) Mapping: a mask-guided strategy is employed to select keyframes for the optimization of feature grids scene representation. 4) Scene representation: efficient rendering of predicted color and depth values is achieved through surface-focused point sampling.
  • Figure 3: Reconstruction results on TUM RGB-D. The red box highlights areas with dynamic object.
  • Figure 4: Reconstruction results on Replica dataset. The red box highlights the improved areas.