Table of Contents
Fetching ...

Direct Learning of Mesh and Appearance via 3D Gaussian Splatting

Ancheng Lin, Yusheng Xiang, Paul Kennedy, Jun Li

TL;DR

This work introduces a direct learning framework that couples an explicit mesh with 3D Gaussian Splatting, binding Gaussians to mesh faces and using a neural appearance predictor to drive differentiable rendering. The approach enables end-to-end supervision of both geometry and appearance from photometric data, improving rendering quality and enabling mesh-based manipulation while supporting scene updates without re-learning from scratch. It combines a learnable SDF grid for geometry with a constrained Gaussians-on-faces representation and a background 3DGS component, achieving efficient training and high-quality surfaces on both synthetic and real datasets. Overall, the method delivers a practical, scalable hybrid representation that blends the strengths of explicit geometry with fast, view-dependent rendering for novel view synthesis and surface reconstruction.

Abstract

Accurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). However, existing methods encounter efficiency issues due to indirect geometry learning and the paradigm of separately modeling geometry and surface appearance. In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of both 3DGS and mesh. Experimental results demonstrate that the learned scene model not only improves efficiency and rendering quality but also enables manipulation via the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.

Direct Learning of Mesh and Appearance via 3D Gaussian Splatting

TL;DR

This work introduces a direct learning framework that couples an explicit mesh with 3D Gaussian Splatting, binding Gaussians to mesh faces and using a neural appearance predictor to drive differentiable rendering. The approach enables end-to-end supervision of both geometry and appearance from photometric data, improving rendering quality and enabling mesh-based manipulation while supporting scene updates without re-learning from scratch. It combines a learnable SDF grid for geometry with a constrained Gaussians-on-faces representation and a background 3DGS component, achieving efficient training and high-quality surfaces on both synthetic and real datasets. Overall, the method delivers a practical, scalable hybrid representation that blends the strengths of explicit geometry with fast, view-dependent rendering for novel view synthesis and surface reconstruction.

Abstract

Accurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). However, existing methods encounter efficiency issues due to indirect geometry learning and the paradigm of separately modeling geometry and surface appearance. In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of both 3DGS and mesh. Experimental results demonstrate that the learned scene model not only improves efficiency and rendering quality but also enables manipulation via the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.
Paper Structure (30 sections, 15 equations, 9 figures, 7 tables)

This paper contains 30 sections, 15 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: The Gaussians and the underlying surface did not align well in the original 3DGS Kerbl23_3DGS, whereas our hybrid representation explicitly restricts the Gaussians to the mesh faces. Our method benefits from the high-quality rendering of 3DGS and the explicit surface structure provided by a mesh.
  • Figure 2: Method Overview. The mesh is derived from a learnable SDF grid using a differentiable marching algorithm. Gaussians are created from the mesh faces, ensuring their alignment with the surface. A neural appearance model determines colors for Gaussians, which are then used to render an image.
  • Figure 3: Sub-figure (a) shows using $K=1,3,6$ Gaussians to represent the appearance of a triangle face, (b) defines a local coordinate frame, and (c) illustrates applying linear transformation $\boldsymbol{M}$ to let Gaussians adapt to the irregular triangle. See Sec. \ref{['sec:bind_gs']} for more details.
  • Figure 4: Comparison between two approaches to learn appearance.
  • Figure 5: Qualitative comparisons with baseline methods (SuGaR Guedon23_SuGaR, NeRF2Mesh Tang23_NeRF2Mesh) on the NeRF-Synthetic Mildenhall20_NeRF and Mip-NeRF360 dataset Barron22_Mip360.
  • ...and 4 more figures