Table of Contents
Fetching ...

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

Christian Homeyer, Leon Begiristain, Christoph Schnörr

TL;DR

DROID-Splat integrates a dense end-to-end tracking system with a differentiable 3D Gaussian Splat Renderer to achieve photo-realistic, dense scene reconstructions from monocular video. The framework combines a fast, parallelizable frontend/backend loop with a loop-closure detector and a differentiable renderer that optimizes Gaussian primitives per pixel using a rendering loss, enabling both robust odometry and high-fidelity rendering. Notable contributions include a thorough ablation of components, the integration of monocular depth priors, and a two-stage camera calibration approach that enables in-the-wild reconstruction with unknown intrinsics. The work demonstrates SotA tracking and rendering on standard SLAM benchmarks and discusses the trade-offs between geometry and appearance, highlighting practical guidance for real-time, neural-SLAM systems on consumer GPUs.

Abstract

Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible. However, the tracking performance still lacks behind traditional and end-to-end SLAM systems. An optimal trade-off between robustness, speed and accuracy has not yet been reached, especially for monocular video. In this paper, we introduce a SLAM system based on an end-to-end Tracker and extend it with a Renderer based on recent 3D Gaussian Splatting techniques. Our framework \textbf{DroidSplat} achieves both SotA tracking and rendering results on common SLAM benchmarks. We implemented multiple building blocks of modern SLAM systems to run in parallel, allowing for fast inference on common consumer GPU's. Recent progress in monocular depth prediction and camera calibration allows our system to achieve strong results even on in-the-wild data without known camera intrinsics. Code will be available at \url{https://github.com/ChenHoy/DROID-Splat}.

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

TL;DR

DROID-Splat integrates a dense end-to-end tracking system with a differentiable 3D Gaussian Splat Renderer to achieve photo-realistic, dense scene reconstructions from monocular video. The framework combines a fast, parallelizable frontend/backend loop with a loop-closure detector and a differentiable renderer that optimizes Gaussian primitives per pixel using a rendering loss, enabling both robust odometry and high-fidelity rendering. Notable contributions include a thorough ablation of components, the integration of monocular depth priors, and a two-stage camera calibration approach that enables in-the-wild reconstruction with unknown intrinsics. The work demonstrates SotA tracking and rendering on standard SLAM benchmarks and discusses the trade-offs between geometry and appearance, highlighting practical guidance for real-time, neural-SLAM systems on consumer GPUs.

Abstract

Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible. However, the tracking performance still lacks behind traditional and end-to-end SLAM systems. An optimal trade-off between robustness, speed and accuracy has not yet been reached, especially for monocular video. In this paper, we introduce a SLAM system based on an end-to-end Tracker and extend it with a Renderer based on recent 3D Gaussian Splatting techniques. Our framework \textbf{DroidSplat} achieves both SotA tracking and rendering results on common SLAM benchmarks. We implemented multiple building blocks of modern SLAM systems to run in parallel, allowing for fast inference on common consumer GPU's. Recent progress in monocular depth prediction and camera calibration allows our system to achieve strong results even on in-the-wild data without known camera intrinsics. Code will be available at \url{https://github.com/ChenHoy/DROID-Splat}.

Paper Structure

This paper contains 24 sections, 9 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: DROID-Splat allows to reconstruct a scene with known/unknown intrinsics. By combining an optical flow tracking objective and a fast, dense Renderer, we can achieve photo-realistic Reconstructions while optimizing accurate odometry.
  • Figure 2: DROID-Splat. We make use of an end-to-end SLAM system with an optical flow based objective to perform tracking and reconstruct odometry and a dense initial map. The tracking objective is flexible, which allows us to optimize intrinsics or prior scale and shift as well if wanted. We make use of SotA Gaussian Splatting techniques to learn a photo-realistic reconstruction based on a Rendering objective. Since all components are differentiable and run in parallel, we can let parts interact flexibly.
  • Figure 3: Rendering Results on TUM-RGBD tum-rgbd. We show views, that were not in the training set, i.e. our keyframe buffer. Top two rows show monocular methods, bottom shows RGBD (We show the results with prior for ours). We achieve a higher rendering and depth quality than photoslamglorieslammonogs due to initializing with a dense tracking system and using dense hyperprimitives. Using a monocular prior can even improve upon a sparse laser sensor.
  • Figure 4: Compute-Performance Trade-off. We take the average across TUM RGBD tum-rgbd and Replica replica in RGBD mode. We added the baseline Tracker at the bottom for perspective, this does not have a meaningful Metric attached to it.
  • Figure 5: Results on hand-captured cellphone videos. In-the-wild outdoor scenes pose different challenges than benchmarks. Left: 3D Gaussian Splatting. Right: 2D Gaussian Splatting. While 2DGS is more resistant to floaters due to its surface optimization, it struggles with rendering quality. Both methods cannot deal well with strong lighting changes and reflections without extensions.
  • ...and 6 more figures