Table of Contents
Fetching ...

RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting

Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Masanori Suganuma, Takayuki Okatani

TL;DR

RP-SLAM addresses the challenge of real-time photorealistic SLAM by decoupling camera pose estimation from Gaussian-primitives optimization and leveraging 3D Gaussian splatting for dense, photorealistic scene representation. It introduces three core contributions: efficient incremental mapping with quad-tree adaptive sampling and Gaussian pruning, dynamic keyframe window optimization to maintain map consistency and mitigate forgetting, and monocular keyframe initialization from sparse point clouds to seed accurate Gaussian primitives. The method achieves state-of-the-art rendering quality with compact models and real-time performance on RGB-D and monocular benchmarks, outperforming several coupled and decoupled Gaussian-SLAM baselines. This work advances practical photorealistic SLAM for real-time applications, with a clear pathway to extending to dynamic scenes in future work.

Abstract

3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.

RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting

TL;DR

RP-SLAM addresses the challenge of real-time photorealistic SLAM by decoupling camera pose estimation from Gaussian-primitives optimization and leveraging 3D Gaussian splatting for dense, photorealistic scene representation. It introduces three core contributions: efficient incremental mapping with quad-tree adaptive sampling and Gaussian pruning, dynamic keyframe window optimization to maintain map consistency and mitigate forgetting, and monocular keyframe initialization from sparse point clouds to seed accurate Gaussian primitives. The method achieves state-of-the-art rendering quality with compact models and real-time performance on RGB-D and monocular benchmarks, outperforming several coupled and decoupled Gaussian-SLAM baselines. This work advances practical photorealistic SLAM for real-time applications, with a clear pathway to extending to dynamic scenes in future work.

Abstract

3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.

Paper Structure

This paper contains 24 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of our RP-SLAM. Keyframes and sparse point cloud are provided the feature-based SLAM, where the point cloud are used for monocular keyframe initialization. Afterwards, the dense point cloud is obtained by efficient incremental mapping, which in turn initializes the Gaussian primitives. Finally, the scene representation is optimized by dynamic keyframe window, where the geometric loss is used only for the RGB-D case. The dashed arrows are for the monocular case only
  • Figure 2: Quadtree-based adaptive image sampling guided by image local gradients at different minimum cell sizes: 4, 8, 16. The method is capable of adaptively focusing sampling on regions that are rich in texture. A smaller minimum cell size allows for a more detailed sampling, but this is accompanied by an increased need for processing of the resulting data.
  • Figure 3: Rendered depths in the monocular case. (a) Initial depth obtained by our RP-SLAM from a sparse point cloud, which describes the initial geometry at the viewpoint. (b) Depth obtained by initial iterations using the depth of (a) based on the dense point cloud obtained in Sec. \ref{['mapping']}. (c) Depth obtained according to MonoGS'smatsuki2024gaussian monocular initial method. Following preliminary iterations, the Gaussian primitives obtained by our method in the monocular case have been found to describe the scene structure with reasonable accuracy. This is in comparison to the result obtained by MonoGSmatsuki2024gaussian, which is less satisfactory in this regard.
  • Figure 4: Effect of different minimum cell sizes on rendering high-resolution image in ScanNet++yeshwanth2023scannet++ dataset. When $c=4$, the handwriting on the whiteboard is observed to be more discernible. Zoom in for a clearer view.
  • Figure 5: Qualitative comparisons on Replicastraub2019replica dataset in the monocular case. The green dashed boxes in our method mark areas where RP-SLAM outperforms other methods, such as sharper textures and fewer artefacts. Zoom in for a clearer view.
  • ...and 2 more figures