MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction
Antoine Guédon, Diego Gomez, Nissim Maruani, Bingchen Gong, George Drettakis, Maks Ovsjanikov
TL;DR
MILo introduces a differentiable mesh-in-the-loop for Gaussian Splatting, enabling in-training extraction of a mesh from Gaussian pivots and backpropagation to refine geometry. By coupling a Delaunay-based mesh with the volumetric Gaussian field through bidirectional rendering losses and targeted regularization, MILo achieves state-of-the-art surface quality with an order of magnitude fewer mesh vertices and full-scene reconstruction including backgrounds. The approach leverages differentiable marching tetrahedra and learnable per-vertex SDFs to maintain high geometric fidelity while mitigating erosion and interior artifacts, outperforming prior post-hoc mesh extraction methods across multiple datasets. This yields compact, animation- and physics-friendly meshes that preserve fine details and are practical for downstream applications.
Abstract
While recent advances in Gaussian Splatting have enabled fast reconstruction of high-quality 3D scenes from images, extracting accurate surface meshes remains a challenge. Current approaches extract the surface through costly post-processing steps, resulting in the loss of fine geometric details or requiring significant time and leading to very dense meshes with millions of vertices. More fundamentally, the a posteriori conversion from a volumetric to a surface representation limits the ability of the final mesh to preserve all geometric structures captured during training. We present MILo, a novel Gaussian Splatting framework that bridges the gap between volumetric and surface representations by differentiably extracting a mesh from the 3D Gaussians. We design a fully differentiable procedure that constructs the mesh-including both vertex locations and connectivity-at every iteration directly from the parameters of the Gaussians, which are the only quantities optimized during training. Our method introduces three key technical contributions: a bidirectional consistency framework ensuring both representations-Gaussians and the extracted mesh-capture the same underlying geometry during training; an adaptive mesh extraction process performed at each training iteration, which uses Gaussians as differentiable pivots for Delaunay triangulation; a novel method for computing signed distance values from the 3D Gaussians that enables precise surface extraction while avoiding geometric erosion. Our approach can reconstruct complete scenes, including backgrounds, with state-of-the-art quality while requiring an order of magnitude fewer mesh vertices than previous methods. Due to their light weight and empty interior, our meshes are well suited for downstream applications such as physics simulations or animation.
