Direct Learning of Mesh and Appearance via 3D Gaussian Splatting
Ancheng Lin, Yusheng Xiang, Paul Kennedy, Jun Li
TL;DR
This work introduces a direct learning framework that couples an explicit mesh with 3D Gaussian Splatting, binding Gaussians to mesh faces and using a neural appearance predictor to drive differentiable rendering. The approach enables end-to-end supervision of both geometry and appearance from photometric data, improving rendering quality and enabling mesh-based manipulation while supporting scene updates without re-learning from scratch. It combines a learnable SDF grid for geometry with a constrained Gaussians-on-faces representation and a background 3DGS component, achieving efficient training and high-quality surfaces on both synthetic and real datasets. Overall, the method delivers a practical, scalable hybrid representation that blends the strengths of explicit geometry with fast, view-dependent rendering for novel view synthesis and surface reconstruction.
Abstract
Accurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). However, existing methods encounter efficiency issues due to indirect geometry learning and the paradigm of separately modeling geometry and surface appearance. In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of both 3DGS and mesh. Experimental results demonstrate that the learned scene model not only improves efficiency and rendering quality but also enables manipulation via the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.
