DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting
Christian Homeyer, Leon Begiristain, Christoph Schnörr
TL;DR
DROID-Splat integrates a dense end-to-end tracking system with a differentiable 3D Gaussian Splat Renderer to achieve photo-realistic, dense scene reconstructions from monocular video. The framework combines a fast, parallelizable frontend/backend loop with a loop-closure detector and a differentiable renderer that optimizes Gaussian primitives per pixel using a rendering loss, enabling both robust odometry and high-fidelity rendering. Notable contributions include a thorough ablation of components, the integration of monocular depth priors, and a two-stage camera calibration approach that enables in-the-wild reconstruction with unknown intrinsics. The work demonstrates SotA tracking and rendering on standard SLAM benchmarks and discusses the trade-offs between geometry and appearance, highlighting practical guidance for real-time, neural-SLAM systems on consumer GPUs.
Abstract
Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible. However, the tracking performance still lacks behind traditional and end-to-end SLAM systems. An optimal trade-off between robustness, speed and accuracy has not yet been reached, especially for monocular video. In this paper, we introduce a SLAM system based on an end-to-end Tracker and extend it with a Renderer based on recent 3D Gaussian Splatting techniques. Our framework \textbf{DroidSplat} achieves both SotA tracking and rendering results on common SLAM benchmarks. We implemented multiple building blocks of modern SLAM systems to run in parallel, allowing for fast inference on common consumer GPU's. Recent progress in monocular depth prediction and camera calibration allows our system to achieve strong results even on in-the-wild data without known camera intrinsics. Code will be available at \url{https://github.com/ChenHoy/DROID-Splat}.
