Table of Contents
Fetching ...

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, Tianzhu Zhang

TL;DR

MeshSplat introduces a generalizable sparse-view surface reconstruction framework that uses pixel-aligned 2D Gaussian splats (2DGS) as a bridge between novel-view synthesis and geometric priors. A feed-forward Gaussian Prediction Network, aided by a plane-swept cost volume, depth refinement, and a normal predictor, yields 2DGS that enable novel-view supervision and subsequent mesh extraction. Two novel losses, the Weighted Chamfer Distance Loss and an uncertainty-guided normal NLL loss with monocular priors, improve 2DGS position and orientation under sparse inputs. Across Re10K, Scannet, and Replica, MeshSplat achieves state-of-the-art performance for generalizable sparse-view mesh reconstruction and demonstrates strong cross-dataset generalization and efficient rendering relative to NeRF-based approaches.

Abstract

Surface reconstruction has been widely studied in computer vision and graphics. However, existing surface reconstruction works struggle to recover accurate scene geometry when the input views are extremely sparse. To address this issue, we propose MeshSplat, a generalizable sparse-view surface reconstruction framework via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects novel view synthesis to learned geometric priors and then transfers these priors to achieve surface reconstruction. Specifically, we incorporate a feed-forward network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize novel view images and thus eliminates the need for direct 3D ground-truth supervision. To improve the accuracy of 2DGS position and orientation prediction, we propose a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping areas of input views, and also a normal prediction network to align the orientation of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive experiments validate the effectiveness of our proposed improvement, demonstrating that our method achieves state-of-the-art performance in generalizable sparse-view mesh reconstruction tasks. Project Page: https://hanzhichang.github.io/meshsplat_web

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

TL;DR

MeshSplat introduces a generalizable sparse-view surface reconstruction framework that uses pixel-aligned 2D Gaussian splats (2DGS) as a bridge between novel-view synthesis and geometric priors. A feed-forward Gaussian Prediction Network, aided by a plane-swept cost volume, depth refinement, and a normal predictor, yields 2DGS that enable novel-view supervision and subsequent mesh extraction. Two novel losses, the Weighted Chamfer Distance Loss and an uncertainty-guided normal NLL loss with monocular priors, improve 2DGS position and orientation under sparse inputs. Across Re10K, Scannet, and Replica, MeshSplat achieves state-of-the-art performance for generalizable sparse-view mesh reconstruction and demonstrates strong cross-dataset generalization and efficient rendering relative to NeRF-based approaches.

Abstract

Surface reconstruction has been widely studied in computer vision and graphics. However, existing surface reconstruction works struggle to recover accurate scene geometry when the input views are extremely sparse. To address this issue, we propose MeshSplat, a generalizable sparse-view surface reconstruction framework via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects novel view synthesis to learned geometric priors and then transfers these priors to achieve surface reconstruction. Specifically, we incorporate a feed-forward network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize novel view images and thus eliminates the need for direct 3D ground-truth supervision. To improve the accuracy of 2DGS position and orientation prediction, we propose a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping areas of input views, and also a normal prediction network to align the orientation of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive experiments validate the effectiveness of our proposed improvement, demonstrating that our method achieves state-of-the-art performance in generalizable sparse-view mesh reconstruction tasks. Project Page: https://hanzhichang.github.io/meshsplat_web

Paper Structure

This paper contains 23 sections, 12 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Given sparse-view images as input, MeshSplat can directly predict the scene geometry and efficiently extract the scene mesh. Compared to MVSplat chen2024mvsplat and other state-of-the-art methods, Meshplat achieves more consistent and precise mesh extraction in generalizable sparse-view surface reconstruction.
  • Figure 2: Motivation. (a) The ellipsoid shape of 3DGS leads to different intersection planes in different viewpoints, resulting in inconsistent surface. (b) 2DGS has consistent intersection planes in different viewpoints, which is more suitable for surface reconstruction. (c) When the positions and orientations of 2DGS are not regularized, there will be significant discrepancies between 2DGS and the contours of the surface, which hinders the reconstruction of scene surfaces.
  • Figure 3: Overall Architecture. Taken a pair of images as input, MeshSplat begins with a multi-view backbone to extract per-view feature maps. Then we construct per-view cost volumes via the plane-sweeping to generate coarse depth maps, which can be projected to 3D point clouds and be constrained by our proposed Weighted Chamfer Distance Loss. We further feed cost volumes into our gaussian prediction network, together with a depth refinement network and a normal prediction network, to obtain pixel-aligned 2DGS. Finally, we use these 2DGS to render novel view for supervision and reconstruct the scene mesh.
  • Figure 4: Quanlitative Comparisons on Re10K Dataset. While the baseline methods provide meshes with holes and uneven surfaces, MeshSplat successfully reconstruct the scene with smoother and more complete meshes.
  • Figure 6: Qualitative Comparisons in Zero-Shot Transfer Experiments on Scannet and Replica Datasets. Compared to MVSplat, MeshSplat can still extract smoother surfaces, demostrating its generalization across different datasets.
  • ...and 9 more figures