Table of Contents
Fetching ...

Resonance4D: Frequency-Domain Motion Supervision for Preset-Free Physical Parameter Learning in 4D Dynamic Physical Scene Simulation

Changshe Zhang, Jie Feng, Siyu Chen, Guanbin Li, Ronghua Shang, Junpeng Zhang

Abstract

Physics-driven 4D dynamic simulation from static 3D scenes remains constrained by an overlooked contradiction: reliable motion supervision often relies on online video diffusion or optical-flow pipelines whose computational cost exceeds that of the simulator itself. Existing methods further simplify inverse physical modeling by optimizing only partial material parameters, limiting realism in scenes with complex materials and dynamics. We present Resonance4D, a physics-driven 4D dynamic simulation framework that couples 3D Gaussian Splatting with the Material Point Method through lightweight yet physically expressive supervision. Our key insight is that dynamic consistency can be enforced without dense temporal generation by jointly constraining motion in complementary domains. To this end, we introduce Dual-domain Motion Supervision (DMS), which combines spatial structural consistency for local deformation with frequency-domain spectral consistency for oscillatory and global dynamic patterns, substantially reducing training cost and memory overhead while preserving physically meaningful motion cues. To enable stable full-parameter physical recovery, we further combine zero-shot text-prompted segmentation with simulation-guided initialization to automatically decompose Gaussians into object-part-level regions and support joint optimization of full material parameters. Experiments on both synthetic and real scenes show that Resonance4D achieves strong physical fidelity and motion consistency while reducing peak GPU memory from over 35\,GB to around 20\,GB, enabling high-fidelity physics-driven 4D simulation on a single consumer-grade GPU.

Resonance4D: Frequency-Domain Motion Supervision for Preset-Free Physical Parameter Learning in 4D Dynamic Physical Scene Simulation

Abstract

Physics-driven 4D dynamic simulation from static 3D scenes remains constrained by an overlooked contradiction: reliable motion supervision often relies on online video diffusion or optical-flow pipelines whose computational cost exceeds that of the simulator itself. Existing methods further simplify inverse physical modeling by optimizing only partial material parameters, limiting realism in scenes with complex materials and dynamics. We present Resonance4D, a physics-driven 4D dynamic simulation framework that couples 3D Gaussian Splatting with the Material Point Method through lightweight yet physically expressive supervision. Our key insight is that dynamic consistency can be enforced without dense temporal generation by jointly constraining motion in complementary domains. To this end, we introduce Dual-domain Motion Supervision (DMS), which combines spatial structural consistency for local deformation with frequency-domain spectral consistency for oscillatory and global dynamic patterns, substantially reducing training cost and memory overhead while preserving physically meaningful motion cues. To enable stable full-parameter physical recovery, we further combine zero-shot text-prompted segmentation with simulation-guided initialization to automatically decompose Gaussians into object-part-level regions and support joint optimization of full material parameters. Experiments on both synthetic and real scenes show that Resonance4D achieves strong physical fidelity and motion consistency while reducing peak GPU memory from over 35\,GB to around 20\,GB, enabling high-fidelity physics-driven 4D simulation on a single consumer-grade GPU.

Paper Structure

This paper contains 15 sections, 17 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Motivation for dual-domain motion analysis and part-level physical modeling. (a) By extracting horizontal scanlines at a fixed height and stacking them over time, an input video is converted into a $w-t$ spatiotemporal slice that explicitly unfolds motion trajectories into a 2D structural pattern. (b) In this representation, oscillation exhibits a clearer periodic trajectory and a more pronounced dominant frequency peak than deformation. (c) Different density settings lead to markedly different motion behaviors: compared with uniform density, a more plausible part-level density distribution produces more natural swaying and restoring motion.
  • Figure 2: Resonance4D is a unified framework for physics-driven dynamic simulation from static reconstruction. Its key insight is that effective motion supervision does not require heavy online video priors; lightweight dual-domain spatial-spectral constraints are sufficient to guide physically meaningful dynamics. Guided by this insight, the framework first constructs a reference motion video, then identifies movable regions through automatic part-level assignment, next obtains a reliable starting point via simulation-driven initialization over the feasible physical parameter space, and finally performs part-level physical parameter optimization under Dual-domain Motion Supervision.
  • Figure 3: Visual comparison on four scenes from the PhysDreamer dataset. For each method, we show representative video frames, the motion-region ww-tt visualization, and the corresponding FFT spectrum. The $w$-$t$ maps reveal temporal motion patterns along a fixed spatial coordinate, while the FFT plots further characterize their frequency responses. In the carnations scene, our method better preserves the oscillatory motion property, producing clearer periodic patterns and frequency peaks that are closer to the real-world reference. In the telephone scene, our method achieves the closest alignment with the real-world motion in both the spatiotemporal trajectory and the spectral distribution.
  • Figure 4: Qualitative comparison on the bird and toothpaste scenes at consecutive time steps. The bird scene corresponds to relatively stiff material behavior, while toothpaste exhibits softer dynamics. Our method better matches the material-dependent motion in both cases. PAC-NeRF tends to produce overly soft deformation in the stiff bird scene, whereas PhysFlow is overly rigid. In the toothpaste scene, PhysFlow shows an unnatural rebound in shape evolution, which deviates from the ground truth.