Table of Contents
Fetching ...

MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting

Shaojie Ma, Yawei Luo, Wei Yang, Yi Yang

TL;DR

MaGS proposes a unified framework that jointly reconstructs and simulates dynamic 3D objects from monocular video by binding 3D Gaussian splats to a mesh surface, forming a hybrid representation. It introduces MPE-Net to extract mesh pose, and two deformation nets, RMD-Net and RGD-Net, to learn relative mesh and Gaussian deformations, enabling backpropagatable optimization via differentiable rendering. The approach is compatible with geometric priors such as ARAP and SMPL and supports mesh-guided soft-physics simulations, yielding state-of-the-art results on D-NeRF, DG-Mesh, and PeopleSnapshot. By maintaining cross-frame mesh-Gaussian correspondence and allowing Gaussian roaming on the mesh, MaGS achieves improved rendering accuracy and realistic dynamic deformation while preserving mesh continuity across frames.

Abstract

3D reconstruction and simulation, although interrelated, have distinct objectives: reconstruction requires a flexible 3D representation that can adapt to diverse scenes, while simulation needs a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to address this challenge. MaGS constrains 3D Gaussians to roam near the mesh, creating a mutually adsorbed mesh-Gaussian 3D representation. Such representation harnesses both the rendering flexibility of 3D Gaussians and the structured property of meshes. To achieve this, we introduce RMD-Net, a network that learns motion priors from video data to refine mesh deformations, alongside RGD-Net, which models the relative displacement between the mesh and Gaussians to enhance rendering fidelity under mesh constraints. To generalize to novel, user-defined deformations beyond input video without reliance on temporal data, we propose MPE-Net, which leverages inherent mesh information to bootstrap RMD-Net and RGD-Net. Due to the universality of meshes, MaGS is compatible with various deformation priors such as ARAP, SMPL, and soft physics simulation. Extensive experiments on the D-NeRF, DG-Mesh, and PeopleSnapshot datasets demonstrate that MaGS achieves state-of-the-art performance in both reconstruction and simulation.

MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting

TL;DR

MaGS proposes a unified framework that jointly reconstructs and simulates dynamic 3D objects from monocular video by binding 3D Gaussian splats to a mesh surface, forming a hybrid representation. It introduces MPE-Net to extract mesh pose, and two deformation nets, RMD-Net and RGD-Net, to learn relative mesh and Gaussian deformations, enabling backpropagatable optimization via differentiable rendering. The approach is compatible with geometric priors such as ARAP and SMPL and supports mesh-guided soft-physics simulations, yielding state-of-the-art results on D-NeRF, DG-Mesh, and PeopleSnapshot. By maintaining cross-frame mesh-Gaussian correspondence and allowing Gaussian roaming on the mesh, MaGS achieves improved rendering accuracy and realistic dynamic deformation while preserving mesh continuity across frames.

Abstract

3D reconstruction and simulation, although interrelated, have distinct objectives: reconstruction requires a flexible 3D representation that can adapt to diverse scenes, while simulation needs a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to address this challenge. MaGS constrains 3D Gaussians to roam near the mesh, creating a mutually adsorbed mesh-Gaussian 3D representation. Such representation harnesses both the rendering flexibility of 3D Gaussians and the structured property of meshes. To achieve this, we introduce RMD-Net, a network that learns motion priors from video data to refine mesh deformations, alongside RGD-Net, which models the relative displacement between the mesh and Gaussians to enhance rendering fidelity under mesh constraints. To generalize to novel, user-defined deformations beyond input video without reliance on temporal data, we propose MPE-Net, which leverages inherent mesh information to bootstrap RMD-Net and RGD-Net. Due to the universality of meshes, MaGS is compatible with various deformation priors such as ARAP, SMPL, and soft physics simulation. Extensive experiments on the D-NeRF, DG-Mesh, and PeopleSnapshot datasets demonstrate that MaGS achieves state-of-the-art performance in both reconstruction and simulation.
Paper Structure (31 sections, 17 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 17 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: Structure of MPE-Net.
  • Figure 2: Pipeline of MaGS. MaGS begins by extracting a temporally consistent coarse mesh for each frame of video. These meshes, referred to as Guide Meshes, provide the foundation for dynamic reconstruction. During the reconstruction process, pose information from the guide meshes is extracted using MPE-Net and forwarded to RMD-Net and RGD-Net. RMD-Net and RGD-Net perform relative deformations on the guide mesh and the Mesh-adsorbed Gaussians, respectively, yielding the refined mesh and relative deformed Gaussian. These two components produce the Final Deformed Gaussians. Splatting-based rendering is then employed, with the rendering loss used to optimize the Gaussians, MPE-Net, RMD-Net, and RGD-Net via backpropagation. The reconstruction phase not only yields a high-precision mesh and Gaussians but also trains the networks to learn deformation principles from the video, effectively preparing them for simulation. In the simulation phase, mesh-based techniques—such as soft body simulation, ARAP, and SMPL—are used to deform the reconstructed meshes, producing new guide mesh. Mesh-adsorbed Gaussians are also inherited (adsorbed to their corresponding facets). The following process resembles the reconstruction, where MPE-Net, RMD-Net, and RGD-Net are again utilized to yield the Final Deformed Gaussians, which are then rendered to generate the final image.
  • Figure 2: Structure of RMD-Net.
  • Figure 3: Simulation comparison on the D-NeRF dataset pumarola_d-nerf_2020.
  • Figure 3: Structure of RGD-Net.
  • ...and 10 more figures