Table of Contents
Fetching ...

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, Andrew J. Davison

TL;DR

This work introduces the first online monocular SLAM system built entirely on 3D Gaussian Splatting (3DGS), enabling live high-fidelity reconstruction and novel-view rendering with a single RGB stream. It derives an analytic SE(3) camera pose Jacobian for direct optimization against a Gaussian map, adds isotropic regularisation to stabilize geometry, and implements Gaussian insertion/pruning within a unified map-centric SLAM framework. The approach achieves state-of-the-art results in trajectory estimation and rendering quality on monocular and RGB-D benchmarks, and demonstrates robustness across challenging scenes including transparent and thin structures. While current results are in small-scale environments and avoid loop closure, the framework offers strong potential for real-time, high-fidelity Spatial AI with future extensions to large-scale maps and loop-closure integration.

Abstract

We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.

Gaussian Splatting SLAM

TL;DR

This work introduces the first online monocular SLAM system built entirely on 3D Gaussian Splatting (3DGS), enabling live high-fidelity reconstruction and novel-view rendering with a single RGB stream. It derives an analytic SE(3) camera pose Jacobian for direct optimization against a Gaussian map, adds isotropic regularisation to stabilize geometry, and implements Gaussian insertion/pruning within a unified map-centric SLAM framework. The approach achieves state-of-the-art results in trajectory estimation and rendering quality on monocular and RGB-D benchmarks, and demonstrates robustness across challenging scenes including transparent and thin structures. While current results are in small-scale environments and avoid loop closure, the framework offers strong potential for real-time, high-fidelity Spatial AI with future extensions to large-scale maps and loop-closure integration.

Abstract

We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.
Paper Structure (63 sections, 24 equations, 12 figures, 16 tables)

This paper contains 63 sections, 24 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: From a single monocular camera, we reconstruct a high fidelity 3D scene live at 3fps. For every incoming RGB frame, 3D Gaussians are incrementally formed and optimised together with the camera poses. We show both the rasterised Gaussians (left) and Gaussians shaded to highlight the geometry (right). Notice the details and the complex material properties (e.g. transparency) captured. Thin structures such as wires are accurately represented by numerous small, elongated Gaussians, and transparent objects are effectively represented by placing the Gaussians along the rim. Our system significantly advances the fidelity a live monocular SLAM system can capture.
  • Figure 2: SLAM System Overview: Our SLAM system uses 3D Gaussians as the only representation, unifying all components of SLAM, including tracking, mapping, keyframe management, and novel view synthesis.
  • Figure 3: Effect of isotropic regularisation: Top: Rendering close to a training view (looking at the keyboard). Bottom: Rendering 3D Gaussians far from the training views (view from a side of the keyboard) without (left) and with (right) the isotropic loss. When the photometric constraints are insufficient, the Gaussians tend to elongate along the viewing direction, creating artefacts in the novel views, and affecting the camera tracking.
  • Figure 4: Rendering examples on Replica. Point-SLAM struggle with rendering fine details due to the stochastic ray sampling.
  • Figure 5: Convergence basin analysis: Left: 3D Gaussian map from training views (Yellow) and visualisation of the test poses (Red) and target pose (Blue). Right: Convergence basin of our method. The green marks success, and the red marks failure.
  • ...and 7 more figures