GSO-SLAM: Bidirectionally Coupled Gaussian Splatting and Direct Visual Odometry
Jiung Yeon, Seongbo Ha, Hyeonwoo Yu
TL;DR
GSO-SLAM tackles real-time monocular dense SLAM by bidirectionally coupling Visual Odometry and Gaussian Splatting through an EM framework, enabling joint refinement of camera poses, semi-dense depth, and the Gaussian scene without extra computational cost. A novel Gaussian Splat Initialization leverages VO outputs, image gradients, and multi-keyframe covariances to quickly initialize Gaussians close to their final configuration, accelerating convergence and improving fidelity. The method is evaluated across synthetic and real datasets, demonstrating superior geometric and photometric reconstruction quality and robust tracking, with real-time performance and favorable scalability. This work offers a practical pathway to high-fidelity dense mapping on monocular systems, reducing computational overhead while enhancing both tracking robustness and map quality in diverse environments.
Abstract
We propose GSO-SLAM, a real-time monocular dense SLAM system that leverages Gaussian scene representation. Unlike existing methods that couple tracking and mapping with a unified scene, incurring computational costs, or loosely integrate them with well-structured tracking frameworks, introducing redundancies, our method bidirectionally couples Visual Odometry (VO) and Gaussian Splatting (GS). Specifically, our approach formulates joint optimization within an Expectation-Maximization (EM) framework, enabling the simultaneous refinement of VO-derived semi-dense depth estimates and the GS representation without additional computational overhead. Moreover, we present Gaussian Splat Initialization, which utilizes image information, keyframe poses, and pixel associations from VO to produce close approximations to the final Gaussian scene, thereby eliminating the need for heuristic methods. Through extensive experiments, we validate the effectiveness of our method, showing that it not only operates in real time but also achieves state-of-the-art geometric/photometric fidelity of the reconstructed scene and tracking accuracy.
