Table of Contents
Fetching ...

MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction

Heng Zhou, Zhetao Guo, Shuhong Liu, Lechen Zhang, Qihao Wang, Yuxiang Ren, Mingrui Li

TL;DR

MoD-SLAM is proposed, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes and introduces a depth estimation module in the front-end to extract accurate priori depth values to supervise mapping and tracking processes.

Abstract

Monocular SLAM has received a lot of attention due to its simple RGB inputs and the lifting of complex sensor constraints. However, existing monocular SLAM systems are designed for bounded scenes, restricting the applicability of SLAM systems. To address this limitation, we propose MoD-SLAM, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes. Specifically, we introduce a Gaussian-based unbounded scene representation approach to solve the challenge of mapping scenes without boundaries. This strategy is essential to extend the SLAM application. Moreover, a depth estimation module in the front-end is designed to extract accurate priori depth values to supervise mapping and tracking processes. By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes. Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively compared with existing state-of-the-art monocular SLAM systems.

MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction

TL;DR

MoD-SLAM is proposed, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes and introduces a depth estimation module in the front-end to extract accurate priori depth values to supervise mapping and tracking processes.

Abstract

Monocular SLAM has received a lot of attention due to its simple RGB inputs and the lifting of complex sensor constraints. However, existing monocular SLAM systems are designed for bounded scenes, restricting the applicability of SLAM systems. To address this limitation, we propose MoD-SLAM, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes. Specifically, we introduce a Gaussian-based unbounded scene representation approach to solve the challenge of mapping scenes without boundaries. This strategy is essential to extend the SLAM application. Moreover, a depth estimation module in the front-end is designed to extract accurate priori depth values to supervise mapping and tracking processes. By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes. Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively compared with existing state-of-the-art monocular SLAM systems.
Paper Structure (15 sections, 20 equations, 9 figures, 7 tables)

This paper contains 15 sections, 20 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Reconstruction results on ScanNet 0207 Dai2017ScanNetR3. We present MoD-SLAM, a neural-based monocular dense mapping method. Our method shows a more powerful capability than existing state-of-the-art work: GO-SLAM Zhang2023GOSLAMGO.
  • Figure 2: Overview of MoD-SLAM. We demonstrate the monocular mode of our system, which uses RGB streams as its input. 1) Within the tracking process, the system performs depth-supervised camera pose prediction and executes loop closure and global optimization based on a co-visibility check to refine camera poses and correct drift. Simultaneously, it selects keyframes to fed into the mapping module. 2) The mapping process leverages the RGB map derived from keyframes to forecast depth values via a depth estimation and depth distillation modules. Concurrently, Gaussian encoding and contraction function are applied to the keyframes, and the resultant spatial mean and covariance are fed into MLPs to reconstruct unbounded scenes.
  • Figure 3: (a) Scene Reparameterization. To manage unbounded scenes, we use a contract function to map mean and covariance data from spatial sampling (black dashed line) directly to a new range (red solid line). We save the data within the sphere of radius 1 (yellow region) while mapping the data from regions beyond a radius of 1 into the spherical coordinate system with a radius of 2 (orange region). (b) NDC vs Our Reparameterization. NDC results in voids at the boundaries of geometric objects and lacks the feature extraction and reconstruction quality achieved by our reparameterization approach.
  • Figure 4: Reconstruction results on Replica Straub2019TheRD. Compared with baselines, the 3D scenes reconstructed by our method show enhanced geometric structure and texture features.
  • Figure 5: Reconstruction results on ScanNet Dai2017ScanNetR3. We show the full view of the reconstructed 3D scenes. Our system enhances the precision in scaling the scenes.
  • ...and 4 more figures