Table of Contents
Fetching ...

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf, Amit Bracha, Ron Kimmel

TL;DR

This work tackles the difficulty of extracting coherent geometry from 3D Gaussian Splatting (3DGS), where Gaussian centers do not form a smooth surface due to photometric-optimized optimization. It proposes using a pre-trained stereo matching model to estimate depth from stereo-aligned renders of the 3DGS scene, then fusing the depths with TSDF and Marching Cubes to produce a high-quality mesh. The approach achieves state-of-the-art or competitive results on DTU and Tanks & Temples among Gaussian-based methods, matches or surpasses neural methods in some regimes, and does so with significantly shorter compute times, including in-the-wild smartphone scenes. By leveraging real-world geometric priors, the method offers a practical, fast, and accurate path for surface reconstruction from Gaussian splatting, while retaining compatibility with the original 3DGS representation.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

TL;DR

This work tackles the difficulty of extracting coherent geometry from 3D Gaussian Splatting (3DGS), where Gaussian centers do not form a smooth surface due to photometric-optimized optimization. It proposes using a pre-trained stereo matching model to estimate depth from stereo-aligned renders of the 3DGS scene, then fusing the depths with TSDF and Marching Cubes to produce a high-quality mesh. The approach achieves state-of-the-art or competitive results on DTU and Tanks & Temples among Gaussian-based methods, matches or surpasses neural methods in some regimes, and does so with significantly shorter compute times, including in-the-wild smartphone scenes. By leveraging real-world geometric priors, the method offers a practical, fast, and accurate path for surface reconstruction from Gaussian splatting, while retaining compatibility with the original 3DGS representation.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.
Paper Structure (25 sections, 16 figures, 4 tables)

This paper contains 25 sections, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Qualitative results on Mip-NeRF360Mip-NeRF dataset garden scene.
  • Figure 2: The proposed pipeline for surface reconstruction. First, we represent the scene by applying a 3DGS model. We then use the 3DGS model to render stereo-aligned pairs of images corresponding to the original views. For each pair, using a shape from stereo algorithm, we reconstruct an RGB-D structure, which is then integrated from all views using TSDF TSDF into a triangulated mesh of the scene.
  • Figure 3: Example of our method's output on DTU DTU scan105. From left to right: The rendered left and right images, segmentation mask, left-right disparity, occlusion mask, and shading - depth gradient.
  • Figure 4: Qualitative comparison of mesh reconstruction from in-the-wild videos between our method and SuGaR SuGaR.
  • Figure 5: Qualitative results on Tanks and Temples tnt. Top row: Ignatius scene, compared to SuGaR SuGaR. Bottom row: Barn scene, compared to SuGaR.
  • ...and 11 more figures