Table of Contents
Fetching ...

RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting

Zhenzhong Cao, Chenyang Zhao, Qianyi Zhang, Jinzheng Guang, Yinuo Song Jingtai Liu

TL;DR

RGBDS-SLAM tackles the challenge of obtaining high-fidelity RGB-D semantic reconstructions in dense SLAM by coupling 3D Multi-Level Pyramid Gaussian Splatting with a tightly integrated multi-feature optimization. The method extends ORB-SLAM3 with a four-thread pipeline and represents the scene using 3D Gaussian primitives projected via learned pyramids to jointly refine RGB, depth, and semantic maps. Experiments on Replica and ScanNet show state-of-the-art RGB-D reconstruction quality and semantic accuracy while maintaining real-time performance, validated by ablations that confirm the contribution of the ML-P-GS module and the cross-feature optimization. The approach advances dense SLAM by enabling detailed, semantically aware reconstructions in real time, with open-source code to facilitate adoption, though dynamic scenes remain an area for future work.

Abstract

High-quality reconstruction is crucial for dense SLAM. Recent popular approaches utilize 3D Gaussian Splatting (3D GS) techniques for RGB, depth, and semantic reconstruction of scenes. However, these methods often overlook issues of detail and consistency in different parts of the scene. To address this, we propose RGBDS-SLAM, a RGB-D semantic dense SLAM system based on 3D multi-level pyramid gaussian splatting, which enables high-quality dense reconstruction of scene RGB, depth, and semantics.In this system, we introduce a 3D multi-level pyramid gaussian splatting method that restores scene details by extracting multi-level image pyramids for gaussian splatting training, ensuring consistency in RGB, depth, and semantic reconstructions. Additionally, we design a tightly-coupled multi-features reconstruction optimization mechanism, allowing the reconstruction accuracy of RGB, depth, and semantic maps to mutually enhance each other during the rendering optimization process. Extensive quantitative, qualitative, and ablation experiments on the Replica and ScanNet public datasets demonstrate that our proposed method outperforms current state-of-the-art methods. The open-source code will be available at: https://github.com/zhenzhongcao/RGBDS-SLAM.

RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting

TL;DR

RGBDS-SLAM tackles the challenge of obtaining high-fidelity RGB-D semantic reconstructions in dense SLAM by coupling 3D Multi-Level Pyramid Gaussian Splatting with a tightly integrated multi-feature optimization. The method extends ORB-SLAM3 with a four-thread pipeline and represents the scene using 3D Gaussian primitives projected via learned pyramids to jointly refine RGB, depth, and semantic maps. Experiments on Replica and ScanNet show state-of-the-art RGB-D reconstruction quality and semantic accuracy while maintaining real-time performance, validated by ablations that confirm the contribution of the ML-P-GS module and the cross-feature optimization. The approach advances dense SLAM by enabling detailed, semantically aware reconstructions in real time, with open-source code to facilitate adoption, though dynamic scenes remain an area for future work.

Abstract

High-quality reconstruction is crucial for dense SLAM. Recent popular approaches utilize 3D Gaussian Splatting (3D GS) techniques for RGB, depth, and semantic reconstruction of scenes. However, these methods often overlook issues of detail and consistency in different parts of the scene. To address this, we propose RGBDS-SLAM, a RGB-D semantic dense SLAM system based on 3D multi-level pyramid gaussian splatting, which enables high-quality dense reconstruction of scene RGB, depth, and semantics.In this system, we introduce a 3D multi-level pyramid gaussian splatting method that restores scene details by extracting multi-level image pyramids for gaussian splatting training, ensuring consistency in RGB, depth, and semantic reconstructions. Additionally, we design a tightly-coupled multi-features reconstruction optimization mechanism, allowing the reconstruction accuracy of RGB, depth, and semantic maps to mutually enhance each other during the rendering optimization process. Extensive quantitative, qualitative, and ablation experiments on the Replica and ScanNet public datasets demonstrate that our proposed method outperforms current state-of-the-art methods. The open-source code will be available at: https://github.com/zhenzhongcao/RGBDS-SLAM.

Paper Structure

This paper contains 15 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of the proposed RGBDS-SLAM. Our method is an enhancement of ORB-SLAM3campos2021orb, taking RGB, depth, and semantic frames as input and outputting a map database with the point map, gaussian origin map, and gaussian semantic map. It consists of four threads: Tracking, LocalMapping, GaussianMapping, and LoopClosing.
  • Figure 2: Multi level image pyramid construction. During the training process, it is carried out from top to bottom, with the resolution of the image gradually increasing. First, low resolution is used for quick initialization, and then the details are gradually improved.
  • Figure 3: Qualitative performance of our proposed method on RGB image rendering details from 8 sequences of the Replica dataset is shown. The first and third rows display the randomly rendered RGB images from the 8 sequences, while the second and fourth rows show the corresponding zoomed-in details. The regions of interest in the zoomed-in images are indicated with orange boxes and arrow lines to highlight the magnified details.
  • Figure 4: Qualitative comparison of rendered depth images and groundtruth depth images of our method on office0 sequence of Replica dataset. The first row is the randomly rendered depth images, and the second row is the corresponding groundtruth depth images. The red boxes indicate the differences. The red boxes on the groundtruth depth indicate the areas with missing depth.
  • Figure 5: Qualitative comparison of semantic image rendering of our method on four sequences of Replica dataset. The first row is the RGB image rendered from a random perspective, and the second and third rows are the corresponding rendered semantic images, where the second row is the image before optimization and the third row is the image after optimization. The yellow box indicates the difference comparison with clear semantic segmentation boundaries in the corresponding area.
  • ...and 2 more figures