Table of Contents
Fetching ...

Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects

Shuohan Tao, Boyao Zhou, Hanzhang Tu, Yuwang Wang, Yebin Liu

TL;DR

This work tackles the challenge of robust monocular dynamic object reconstruction by addressing view overfitting in Gaussian-based scene representations. It introduces Tessellation GS, a two-stage approach where Stage One derives a coarse, temporally coherent geometry from large reconstruction model priors, and Stage Two attaches a structured mesh-Gaussian quad-tree to the canonical mesh to jointly optimize motion and appearance with strong locality and density constraints. Key innovations include learnable edge ratios for adaptive subdivision, competitive Gaussian opacities, scale/offset constraints, and an adaptive Gaussian population control, all aimed at mitigating view extrapolation while preserving high-frequency details. The method achieves state-of-the-art results on multiple monocular dynamic benchmarks, with substantial improvements in appearance and mesh geometry metrics and practical training times, demonstrating robust performance under challenging camera motions.

Abstract

3D Gaussian Splatting (GS) enables highly photorealistic scene reconstruction from posed image sequences but struggles with viewpoint extrapolation due to its anisotropic nature, leading to overfitting and poor generalization, particularly in sparse-view and dynamic scene reconstruction. We propose Tessellation GS, a structured 2D GS approach anchored on mesh faces, to reconstruct dynamic scenes from a single continuously moving or static camera. Our method constrains 2D Gaussians to localized regions and infers their attributes via hierarchical neural features on mesh faces. Gaussian subdivision is guided by an adaptive face subdivision strategy driven by a detail-aware loss function. Additionally, we leverage priors from a reconstruction foundation model to initialize Gaussian deformations, enabling robust reconstruction of general dynamic objects from a single static camera, previously extremely challenging for optimization-based methods. Our method outperforms previous SOTA method, reducing LPIPS by 29.1% and Chamfer distance by 49.2% on appearance and mesh reconstruction tasks.

Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects

TL;DR

This work tackles the challenge of robust monocular dynamic object reconstruction by addressing view overfitting in Gaussian-based scene representations. It introduces Tessellation GS, a two-stage approach where Stage One derives a coarse, temporally coherent geometry from large reconstruction model priors, and Stage Two attaches a structured mesh-Gaussian quad-tree to the canonical mesh to jointly optimize motion and appearance with strong locality and density constraints. Key innovations include learnable edge ratios for adaptive subdivision, competitive Gaussian opacities, scale/offset constraints, and an adaptive Gaussian population control, all aimed at mitigating view extrapolation while preserving high-frequency details. The method achieves state-of-the-art results on multiple monocular dynamic benchmarks, with substantial improvements in appearance and mesh geometry metrics and practical training times, demonstrating robust performance under challenging camera motions.

Abstract

3D Gaussian Splatting (GS) enables highly photorealistic scene reconstruction from posed image sequences but struggles with viewpoint extrapolation due to its anisotropic nature, leading to overfitting and poor generalization, particularly in sparse-view and dynamic scene reconstruction. We propose Tessellation GS, a structured 2D GS approach anchored on mesh faces, to reconstruct dynamic scenes from a single continuously moving or static camera. Our method constrains 2D Gaussians to localized regions and infers their attributes via hierarchical neural features on mesh faces. Gaussian subdivision is guided by an adaptive face subdivision strategy driven by a detail-aware loss function. Additionally, we leverage priors from a reconstruction foundation model to initialize Gaussian deformations, enabling robust reconstruction of general dynamic objects from a single static camera, previously extremely challenging for optimization-based methods. Our method outperforms previous SOTA method, reducing LPIPS by 29.1% and Chamfer distance by 49.2% on appearance and mesh reconstruction tasks.

Paper Structure

This paper contains 18 sections, 15 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Illustration of pipeline. In stage one, we get per-frame mesh sequence from LRM by querying each frame. We fix the mesh by Taubin smoothing taubin and subdivide faces or collapse edges until the number of faces reaches our desired initial number of Gaussians. In stage two, we initialize 2D Gaussians defined by neural features on the canonical mesh. We train the neural Gaussians jointly with the deformation model. The resulting Gaussians are extremely robust to view-overfitting.
  • Figure 2: Adaptive densification via mesh-Gaussian quad tree on a single mesh face. Red triangles are leaf nodes of whose associated Gaussians will not further subdivide. Blue triangles are non-leaf nodes with no associated Gaussians. (a) and (b): the tree allows for adaptive density of Gaussians. (c): learnable subdivision ratio further improves the expressiveness.
  • Figure 3: Learnable subdivision ratio fits boundary better. Yellow Gaussian is a parent Gaussian, blue Gaussians are child Gaussians. Color boundary denoted by red and blue regions can be better modeled by child Gaussians than parent. Child Gaussians' opacities will naturally become one through optimization.
  • Figure 4: Test results on Smooth D-NeRF. (a) and (b): input images at two different timesteps. (c), (e), (g), and (i): rendering results at the first timestep. (d), (f), (h), and (j): rendering results at the second timestep. Our results are visually better than all other methods. The second best results are produced by DG-Mesh liu2024dynamic. Their 3D Gaussians' unconstrained scales result in foggy appearance caused by large Gaussian floaters.
  • Figure 5: Unbiased4d johnson2023unbiased results. (a): input image. (b): rendered novel view. (c): extracted mesh.
  • ...and 9 more figures