PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis
Chunji Lv, Zequn Chen, Donglin Di, Weinan Zhang, Hao Li, Wei Chen, Yinjie Lei, Changsheng Li
TL;DR
PhysGM introduces a fully feed-forward, optimization-free pipeline that jointly predicts 3D Gaussian scene representations and physical properties from a single image, enabling immediate physics-driven 4D synthesis via an MPM simulator. It leverages a two-stage training regime—supervised pre-training to learn a physical prior and Gaussian parameters, followed by Direct Preference Optimization (DPO) to align generated dynamics with reference videos without differentiable physics. To support scalable learning, the PhysAssets dataset with 50k+ annotated 3D assets and GT simulations is released. Experiments show PhysGM achieves high-fidelity 4D renderings from a single image in under a minute, offering substantial speedups over SDS-driven baselines while enhancing physical plausibility and temporal coherence.
Abstract
Despite advances in physics-based 3D motion synthesis, current methods face key limitations: reliance on pre-reconstructed 3D Gaussian Splatting (3DGS) built from dense multi-view images with time-consuming per-scene optimization; physics integration via either inflexible, hand-specified attributes or unstable, optimization-heavy guidance from video models using Score Distillation Sampling (SDS); and naive concatenation of prebuilt 3DGS with physics modules, which ignores physical information embedded in appearance and yields suboptimal performance. To address these issues, we propose PhysGM, a feed-forward framework that jointly predicts 3D Gaussian representation and physical properties from a single image, enabling immediate simulation and high-fidelity 4D rendering. Unlike slow appearance-agnostic optimization methods, we first pre-train a physics-aware reconstruction model that directly infers both Gaussian and physical parameters. We further refine the model with Direct Preference Optimization (DPO), aligning simulations with the physically plausible reference videos and avoiding the high-cost SDS optimization. To address the absence of a supporting dataset for this task, we propose PhysAssets, a dataset of 50K+ 3D assets annotated with physical properties and corresponding reference videos. Experiments show that PhysGM produces high-fidelity 4D simulations from a single image in one minute, achieving a significant speedup over prior work while delivering realistic renderings.Our project page is at:https://hihixiaolv.github.io/PhysGM.github.io/
