Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects
Shuohan Tao, Boyao Zhou, Hanzhang Tu, Yuwang Wang, Yebin Liu
TL;DR
This work tackles the challenge of robust monocular dynamic object reconstruction by addressing view overfitting in Gaussian-based scene representations. It introduces Tessellation GS, a two-stage approach where Stage One derives a coarse, temporally coherent geometry from large reconstruction model priors, and Stage Two attaches a structured mesh-Gaussian quad-tree to the canonical mesh to jointly optimize motion and appearance with strong locality and density constraints. Key innovations include learnable edge ratios for adaptive subdivision, competitive Gaussian opacities, scale/offset constraints, and an adaptive Gaussian population control, all aimed at mitigating view extrapolation while preserving high-frequency details. The method achieves state-of-the-art results on multiple monocular dynamic benchmarks, with substantial improvements in appearance and mesh geometry metrics and practical training times, demonstrating robust performance under challenging camera motions.
Abstract
3D Gaussian Splatting (GS) enables highly photorealistic scene reconstruction from posed image sequences but struggles with viewpoint extrapolation due to its anisotropic nature, leading to overfitting and poor generalization, particularly in sparse-view and dynamic scene reconstruction. We propose Tessellation GS, a structured 2D GS approach anchored on mesh faces, to reconstruct dynamic scenes from a single continuously moving or static camera. Our method constrains 2D Gaussians to localized regions and infers their attributes via hierarchical neural features on mesh faces. Gaussian subdivision is guided by an adaptive face subdivision strategy driven by a detail-aware loss function. Additionally, we leverage priors from a reconstruction foundation model to initialize Gaussian deformations, enabling robust reconstruction of general dynamic objects from a single static camera, previously extremely challenging for optimization-based methods. Our method outperforms previous SOTA method, reducing LPIPS by 29.1% and Chamfer distance by 49.2% on appearance and mesh reconstruction tasks.
