MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction
Kunyi Li, Michael Niemeyer, Zeyu Chen, Nassir Navab, Federico Tombari
TL;DR
MonoGSDF addresses the challenge of recovering watertight, topologically consistent surfaces from monocular images by unifying 3D Gaussian splatting with a neural SDF. The key idea is to use the SDF to guide the spatial distribution of Gaussians during training and then reuse Gaussians as priors for efficient mesh extraction, avoiding heavy meshing steps like Marching Cubes. The method introduces a Gaussian-to-SDF coupling via a Gaussian-shaped opacity function and a sigmoid-like, scene-bounded normalization, plus a multi-resolution training regime augmented with geometric cues from monocular estimates. Empirical results on DTU, Tanks and Temples, and Mip-NeRF 360 show state-of-the-art geometry accuracy, competitive novel-view synthesis, and faster mesh extraction, especially on flat and transparent surfaces. This framework advances practical monocular 3D reconstruction by merging explicit Gaussian representations with implicit surface modeling, enabling scalable, efficient, and high-fidelity reconstructions in real-world scenes.
Abstract
Accurate meshing from monocular images remains a key challenge in 3D vision. While state-of-the-art 3D Gaussian Splatting (3DGS) methods excel at synthesizing photorealistic novel views through rasterization-based rendering, their reliance on sparse, explicit primitives severely limits their ability to recover watertight and topologically consistent 3D surfaces.We introduce MonoGSDF, a novel method that couples Gaussian-based primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction. During training, the SDF guides Gaussians' spatial distribution, while at inference, Gaussians serve as priors to reconstruct surfaces, eliminating the need for memory-intensive Marching Cubes. To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization. A multi-resolution training scheme further refines details and monocular geometric cues from off-the-shelf estimators enhance reconstruction quality. Experiments on real-world datasets show MonoGSDF outperforms prior methods while maintaining efficiency.
