Table of Contents
Fetching ...

PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis

Chunji Lv, Zequn Chen, Donglin Di, Weinan Zhang, Hao Li, Wei Chen, Yinjie Lei, Changsheng Li

TL;DR

PhysGM introduces a fully feed-forward, optimization-free pipeline that jointly predicts 3D Gaussian scene representations and physical properties from a single image, enabling immediate physics-driven 4D synthesis via an MPM simulator. It leverages a two-stage training regime—supervised pre-training to learn a physical prior and Gaussian parameters, followed by Direct Preference Optimization (DPO) to align generated dynamics with reference videos without differentiable physics. To support scalable learning, the PhysAssets dataset with 50k+ annotated 3D assets and GT simulations is released. Experiments show PhysGM achieves high-fidelity 4D renderings from a single image in under a minute, offering substantial speedups over SDS-driven baselines while enhancing physical plausibility and temporal coherence.

Abstract

Despite advances in physics-based 3D motion synthesis, current methods face key limitations: reliance on pre-reconstructed 3D Gaussian Splatting (3DGS) built from dense multi-view images with time-consuming per-scene optimization; physics integration via either inflexible, hand-specified attributes or unstable, optimization-heavy guidance from video models using Score Distillation Sampling (SDS); and naive concatenation of prebuilt 3DGS with physics modules, which ignores physical information embedded in appearance and yields suboptimal performance. To address these issues, we propose PhysGM, a feed-forward framework that jointly predicts 3D Gaussian representation and physical properties from a single image, enabling immediate simulation and high-fidelity 4D rendering. Unlike slow appearance-agnostic optimization methods, we first pre-train a physics-aware reconstruction model that directly infers both Gaussian and physical parameters. We further refine the model with Direct Preference Optimization (DPO), aligning simulations with the physically plausible reference videos and avoiding the high-cost SDS optimization. To address the absence of a supporting dataset for this task, we propose PhysAssets, a dataset of 50K+ 3D assets annotated with physical properties and corresponding reference videos. Experiments show that PhysGM produces high-fidelity 4D simulations from a single image in one minute, achieving a significant speedup over prior work while delivering realistic renderings.Our project page is at:https://hihixiaolv.github.io/PhysGM.github.io/

PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis

TL;DR

PhysGM introduces a fully feed-forward, optimization-free pipeline that jointly predicts 3D Gaussian scene representations and physical properties from a single image, enabling immediate physics-driven 4D synthesis via an MPM simulator. It leverages a two-stage training regime—supervised pre-training to learn a physical prior and Gaussian parameters, followed by Direct Preference Optimization (DPO) to align generated dynamics with reference videos without differentiable physics. To support scalable learning, the PhysAssets dataset with 50k+ annotated 3D assets and GT simulations is released. Experiments show PhysGM achieves high-fidelity 4D renderings from a single image in under a minute, offering substantial speedups over SDS-driven baselines while enhancing physical plausibility and temporal coherence.

Abstract

Despite advances in physics-based 3D motion synthesis, current methods face key limitations: reliance on pre-reconstructed 3D Gaussian Splatting (3DGS) built from dense multi-view images with time-consuming per-scene optimization; physics integration via either inflexible, hand-specified attributes or unstable, optimization-heavy guidance from video models using Score Distillation Sampling (SDS); and naive concatenation of prebuilt 3DGS with physics modules, which ignores physical information embedded in appearance and yields suboptimal performance. To address these issues, we propose PhysGM, a feed-forward framework that jointly predicts 3D Gaussian representation and physical properties from a single image, enabling immediate simulation and high-fidelity 4D rendering. Unlike slow appearance-agnostic optimization methods, we first pre-train a physics-aware reconstruction model that directly infers both Gaussian and physical parameters. We further refine the model with Direct Preference Optimization (DPO), aligning simulations with the physically plausible reference videos and avoiding the high-cost SDS optimization. To address the absence of a supporting dataset for this task, we propose PhysAssets, a dataset of 50K+ 3D assets annotated with physical properties and corresponding reference videos. Experiments show that PhysGM produces high-fidelity 4D simulations from a single image in one minute, achieving a significant speedup over prior work while delivering realistic renderings.Our project page is at:https://hihixiaolv.github.io/PhysGM.github.io/

Paper Structure

This paper contains 82 sections, 15 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Overview of PhysGM. Given a single image, PhysGM performs a single feed-forward pass to directly predict 3D Gaussian Splatting (3DGS) representation and its associated physical properties (e.g., stiffness, mass). This prediction is optimization-free and completes in under one second. The generated parameters then initialize a Material Point Method (MPM) simulator, producing the final, physically plausible 4D animation.
  • Figure 2: Pipeline of PhysGM. The model conditions on one or four input views and their corresponding camera parameters, which are processed by a transformer-based model to produce output tokens. These tokens then decoded by two parallel heads: (1) a DPT Head predicting the initial 3D Gaussian scene parameters $\boldsymbol{\psi}$, and (2) a Physics Head that predicts a distribution over the object's physical properties $\boldsymbol{\theta}$. The sampled parameters ($\boldsymbol{\psi}, \boldsymbol{\theta}$) initialize a Material Point Method (MPM) simulator to generate the final dynamic sequence. The entire architecture is trained in a two-stage paradigm: first, supervised pre-training on ground-truth data establishes a well generative prior. Subsequently, a DPO-based fine-tuning stage uses the ranks against a ground-truth video and aligns the model with physically plausible results.
  • Figure 3: Preference calculation. We use SAM-2 ravi2024sam for segmentation and CoTracker-3 for trajectory extraction across the GT and simulated videos. The extracted point tracks quantify the fidelity of each candidate to the GT, yielding a ranked preference tuple.
  • Figure 4: Qualitative results by PhysGM. For different objects, we show the single input image (left), keyframes from the resulting physically-plausible simulation (middle), and the physical properties predicted by our model (right). Our method generates these high-fidelity 4D sequences in under one minute from a single view, without any per-scene optimization.
  • Figure 5: Other Results by PhysGM. PhysGM can demonstrate robust generalization to diverse physical interactions. It accurately simulates complex deformations like stretching and twisting, handles multi-object dynamics with varied materials, and processes real-world data, highlighting its extensibility to novel scenarios.
  • ...and 11 more figures