Table of Contents
Fetching ...

How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey

Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi

TL;DR

The paper surveys the rapid development of radiance-field inspired SLAM, tracing the shift from traditional, hand-crafted and deep-learning methods toward NeRF and 3D Gaussian Splatting representations. It analyzes NeRF-style and 3DGS-style approaches across RGB-D, RGB, and LiDAR modalities, emphasizing submaps, semantics, dynamic scenes, and uncertainty estimation. By compiling extensive experimental results and benchmarking across datasets, it reveals that while NeRF-based methods excel in novel view synthesis and dense shading, 3DGS-based approaches offer faster rendering and explicit geometry, with tradeoffs in memory and scale. The survey identifies key challenges such as real-time performance, global optimization costs, and evaluation inconsistencies, and outlines future directions including compact representations, robust dynamic handling, and standardized benchmarks for fair comparisons.

Abstract

Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges.

How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey

TL;DR

The paper surveys the rapid development of radiance-field inspired SLAM, tracing the shift from traditional, hand-crafted and deep-learning methods toward NeRF and 3D Gaussian Splatting representations. It analyzes NeRF-style and 3DGS-style approaches across RGB-D, RGB, and LiDAR modalities, emphasizing submaps, semantics, dynamic scenes, and uncertainty estimation. By compiling extensive experimental results and benchmarking across datasets, it reveals that while NeRF-based methods excel in novel view synthesis and dense shading, 3DGS-based approaches offer faster rendering and explicit geometry, with tradeoffs in memory and scale. The survey identifies key challenges such as real-time performance, global optimization costs, and evaluation inconsistencies, and outlines future directions including compact representations, robust dynamic handling, and standardized benchmarks for fair comparisons.

Abstract

Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges.
Paper Structure (39 sections, 11 equations, 11 figures, 11 tables)

This paper contains 39 sections, 11 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 2: Comparison of Scene Representations: Implicit, Explicit, and Hybrid. From left to right: Implicit uses a neural network to approximate a radiance field, explicit conducts volume rendering directly on learned spatial feature (voxels, hash grids, etc.), excluding neural components, and hybrid incorporates learned spatial features $\psi$ with neural networks. Both hybrid and explicit approaches enable accelerated training and rendering but require additional memory resources.
  • Figure 3: NeRF and 3DGS differ conceptually. (left) NeRF queries an MLP along the ray, while (right) 3DGS blends Gaussians for the given ray.
  • Figure 4: Qualitative Comparison of Key SLAM Datasets. RGB-D images from: (a) ETH3D-SLAM schops2019bad, (b) ScanNet dai2017scannet, (c) TUM RGB-D tum, and (d) Replica replica19arxiv.
  • Figure 5: Overview of iMap sucar2021imap, the Pioneering Approach in Neural Implicit-based SLAM. (Left) The illustration depicts two concurrent processes: tracking, optimizing the current frame's pose within the locked network; mapping, jointly refining the network and camera poses of selected keyframes. (Right) Jointly optimizing scene network parameters and camera poses for keyframes using differentiable rendering functions. Figure from sucar2021imap.
  • Figure 6: 3D Gaussian Visualization. (Left) Rasterized Gaussians, (Right) Gaussians shaded to highlight the underlying geometry. Images adapted from matsuki2023gaussian.
  • ...and 6 more figures