Table of Contents
Fetching ...

MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction

Kunyi Li, Michael Niemeyer, Zeyu Chen, Nassir Navab, Federico Tombari

TL;DR

MonoGSDF addresses the challenge of recovering watertight, topologically consistent surfaces from monocular images by unifying 3D Gaussian splatting with a neural SDF. The key idea is to use the SDF to guide the spatial distribution of Gaussians during training and then reuse Gaussians as priors for efficient mesh extraction, avoiding heavy meshing steps like Marching Cubes. The method introduces a Gaussian-to-SDF coupling via a Gaussian-shaped opacity function and a sigmoid-like, scene-bounded normalization, plus a multi-resolution training regime augmented with geometric cues from monocular estimates. Empirical results on DTU, Tanks and Temples, and Mip-NeRF 360 show state-of-the-art geometry accuracy, competitive novel-view synthesis, and faster mesh extraction, especially on flat and transparent surfaces. This framework advances practical monocular 3D reconstruction by merging explicit Gaussian representations with implicit surface modeling, enabling scalable, efficient, and high-fidelity reconstructions in real-world scenes.

Abstract

Accurate meshing from monocular images remains a key challenge in 3D vision. While state-of-the-art 3D Gaussian Splatting (3DGS) methods excel at synthesizing photorealistic novel views through rasterization-based rendering, their reliance on sparse, explicit primitives severely limits their ability to recover watertight and topologically consistent 3D surfaces.We introduce MonoGSDF, a novel method that couples Gaussian-based primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction. During training, the SDF guides Gaussians' spatial distribution, while at inference, Gaussians serve as priors to reconstruct surfaces, eliminating the need for memory-intensive Marching Cubes. To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization. A multi-resolution training scheme further refines details and monocular geometric cues from off-the-shelf estimators enhance reconstruction quality. Experiments on real-world datasets show MonoGSDF outperforms prior methods while maintaining efficiency.

MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction

TL;DR

MonoGSDF addresses the challenge of recovering watertight, topologically consistent surfaces from monocular images by unifying 3D Gaussian splatting with a neural SDF. The key idea is to use the SDF to guide the spatial distribution of Gaussians during training and then reuse Gaussians as priors for efficient mesh extraction, avoiding heavy meshing steps like Marching Cubes. The method introduces a Gaussian-to-SDF coupling via a Gaussian-shaped opacity function and a sigmoid-like, scene-bounded normalization, plus a multi-resolution training regime augmented with geometric cues from monocular estimates. Empirical results on DTU, Tanks and Temples, and Mip-NeRF 360 show state-of-the-art geometry accuracy, competitive novel-view synthesis, and faster mesh extraction, especially on flat and transparent surfaces. This framework advances practical monocular 3D reconstruction by merging explicit Gaussian representations with implicit surface modeling, enabling scalable, efficient, and high-fidelity reconstructions in real-world scenes.

Abstract

Accurate meshing from monocular images remains a key challenge in 3D vision. While state-of-the-art 3D Gaussian Splatting (3DGS) methods excel at synthesizing photorealistic novel views through rasterization-based rendering, their reliance on sparse, explicit primitives severely limits their ability to recover watertight and topologically consistent 3D surfaces.We introduce MonoGSDF, a novel method that couples Gaussian-based primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction. During training, the SDF guides Gaussians' spatial distribution, while at inference, Gaussians serve as priors to reconstruct surfaces, eliminating the need for memory-intensive Marching Cubes. To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization. A multi-resolution training scheme further refines details and monocular geometric cues from off-the-shelf estimators enhance reconstruction quality. Experiments on real-world datasets show MonoGSDF outperforms prior methods while maintaining efficiency.

Paper Structure

This paper contains 45 sections, 9 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: MonoGSDF. We show the reconstructed mesh. Compared to 2DGS huang20242d and GOF yu2024gaussian, ours achieves higher F1 scores ($\uparrow$) and reconstructs smooth surfaces with fine details.
  • Figure 2: Overview. MonoGSDF synergizes SDF and Gaussian representations by converting queried SDF values into opacity, encouraging Gaussians to align with surfaces. It uses standard rasterization to render color, depth, and normals, enhanced by geometry cues and a multi-resolution strategy. SDF is jointly trained with 3D supervision from rendered depth.
  • Figure 3: Surface Reconstruction on the DTU jensen2014large. We show normal maps from the reconstructed meshes.
  • Figure 4: Surface Reconstruction on the Tanks and Temples knapitsch2017tanks. We show rendered normal maps from reconstructed meshes.
  • Figure 5: Ablation Study on Geometry Regularization. We show the reconstructed meshes with and without the geometry regularization.
  • ...and 9 more figures