Table of Contents
Fetching ...

EC-SLAM: Effectively Constrained Neural RGB-D SLAM with Sparse TSDF Encoding and Global Bundle Adjustment

Guanghao Li, Qi Chen, YuXiang Yan, Jian Pu

TL;DR

EC-SLAM is introduced, a real-time dense RGB-D simultaneous localization and mapping system leveraging Neural Radiance Fields that reduces the impact of random sampling in NeRF by integrating a feature-based and uniform sampling strategy that minimizes ineffective constraint points for pose optimization.

Abstract

We introduce EC-SLAM, a real-time dense RGB-D simultaneous localization and mapping (SLAM) system leveraging Neural Radiance Fields (NeRF). While recent NeRF-based SLAM systems have shown promising results, they have yet to fully exploit NeRF's potential to constrain pose optimization. EC-SLAM addresses this by using sparse parametric encodings and Truncated Signed Distance Fields (TSDF) to represent the map, enabling efficient fusion, reducing model parameters, and accelerating convergence. Our system also employs a globally constrained Bundle Adjustment (BA) strategy that capitalizes on NeRF's implicit loop closure correction capability, improving tracking accuracy by reinforcing constraints on keyframes most relevant to the current optimized frame. Furthermore, by integrating a feature-based and uniform sampling strategy that minimizes ineffective constraint points for pose optimization, we reduce the impact of random sampling in NeRF. Extensive evaluations on the Replica, ScanNet, and TUM datasets demonstrate state-of-the-art performance, with precise tracking and reconstruction accuracy achieved alongside real-time operation at up to 21 Hz.

EC-SLAM: Effectively Constrained Neural RGB-D SLAM with Sparse TSDF Encoding and Global Bundle Adjustment

TL;DR

EC-SLAM is introduced, a real-time dense RGB-D simultaneous localization and mapping system leveraging Neural Radiance Fields that reduces the impact of random sampling in NeRF by integrating a feature-based and uniform sampling strategy that minimizes ineffective constraint points for pose optimization.

Abstract

We introduce EC-SLAM, a real-time dense RGB-D simultaneous localization and mapping (SLAM) system leveraging Neural Radiance Fields (NeRF). While recent NeRF-based SLAM systems have shown promising results, they have yet to fully exploit NeRF's potential to constrain pose optimization. EC-SLAM addresses this by using sparse parametric encodings and Truncated Signed Distance Fields (TSDF) to represent the map, enabling efficient fusion, reducing model parameters, and accelerating convergence. Our system also employs a globally constrained Bundle Adjustment (BA) strategy that capitalizes on NeRF's implicit loop closure correction capability, improving tracking accuracy by reinforcing constraints on keyframes most relevant to the current optimized frame. Furthermore, by integrating a feature-based and uniform sampling strategy that minimizes ineffective constraint points for pose optimization, we reduce the impact of random sampling in NeRF. Extensive evaluations on the Replica, ScanNet, and TUM datasets demonstrate state-of-the-art performance, with precise tracking and reconstruction accuracy achieved alongside real-time operation at up to 21 Hz.
Paper Structure (13 sections, 7 equations, 6 figures, 7 tables)

This paper contains 13 sections, 7 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: 3D reconstruction and tracking of the trajectory for scene0000 from the ScanNet dataset dai2017scannet. The blue line depicts the ground truth camera trajectory, while the red line shows the estimated camera trajectory. Our method demonstrates superior performance in tracking and quality of scene reconstruction compared to other RGB-D methods such as NICE-SLAM zhu2022nice, Co-SLAM wang2023co, and ESLAM johari2023eslam.
  • Figure 2: The overview of EC-SLAM. (1) Mapping process: We jointly optimize the map and the poses of some keyframes in the sliding window. (2) Tracking process: We use a constant velocity motion model to initialize the tracking frame's pose, followed by iterative optimization. (3) Map: We employ a multi-resolution hash grid to store feature values for each point in space and use two MLPs to decode these features, obtaining color and TSDF values. Here $\boldsymbol{f}_i$, $\psi(\boldsymbol{p}_i)$ and $F_*(\cdot)$ denotes the multi-resolution feature, position encoding and decoder function for a certain point, respectively. We compute color, depth, and TSDF losses to optimize the map and pose. $\mathcal{L}_{c}$, $\mathcal{L}_{d}$, and $\mathcal{L}_{s}$ are the color loss, depth loss, and TSDF loss, respectively.
  • Figure 3: The reconstruction results of our system with other SOTA NeRF-based RGB-D dense visual SLAM systems on the Replica dataset straub2019replica. Our system reconstructed more accurate scenes compared with other systems, and the reconstruction details encircled by the green dashed line further demonstrate our system's better trade-off between sharpness and smoothness.
  • Figure 4: Comparison of our system with other state-of-the-art NeRF-based RGB-D dense visual SLAM systems conducted on the ScanNet dataset dai2017scannet, focusing on the reconstructed scene depth maps and tracking metrics. Blue lines represent the GT camera trajectory, and red lines indicate the estimated trajectory. Under the same camera perspectives, our system reconstructed overall scene structures more consistent with the GT, along with more accurate camera trajectories.
  • Figure 5: Comparison of our system with other state-of-the-art NeRF-based RGB-D dense visual SLAM systems conducted on the ScanNet dataset dai2017scannet (scene0000, scene0106, scene0169), focusing on the reconstructed scene depth maps. Under the same camera perspectives, our system reconstructed scene details more consistent with the GT.
  • ...and 1 more figures