Table of Contents
Fetching ...

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Chen Wang, Chuhao Chen, Yiming Huang, Zhiyang Dou, Yuan Liu, Jiatao Gu, Lingjie Liu

TL;DR

PhysCtrl introduces a diffusion-based framework that learns physics-grounded 3D point trajectories for controllable video generation. By conditioning trajectory generation on material properties and external forces, and enforcing physics-consistent losses, the method provides strong priors that guide pretrained video models to produce high-fidelity, physically plausible videos. Evaluations show superior performance in both trajectory quality and video plausibility compared with state-of-the-art baselines, and ablations confirm the importance of spatial-temporal attention and physics supervision. The approach enables physics-aware video synthesis from a single image with explicit physical controls, offering a scalable path toward more realistic and controllable dynamic scenes.

Abstract

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

TL;DR

PhysCtrl introduces a diffusion-based framework that learns physics-grounded 3D point trajectories for controllable video generation. By conditioning trajectory generation on material properties and external forces, and enforcing physics-consistent losses, the method provides strong priors that guide pretrained video models to produce high-fidelity, physically plausible videos. Evaluations show superior performance in both trajectory quality and video plausibility compared with state-of-the-art baselines, and ablations confirm the importance of spatial-temporal attention and physics supervision. The approach enables physics-aware video synthesis from a single image with explicit physical controls, offering a scalable path toward more realistic and controllable dynamic scenes.

Abstract

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl

Paper Structure

This paper contains 21 sections, 11 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: We propose PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical material and force control. PhysCtrl supports generating physics-plausible motion trajectories across multiple materials as control signals (second row), and allows controls over physics parameters (e.g., Young's Modulus $E$ of elastic material (third row)) and force (last row). Note that in the bottom three rows, overlaid trajectories and frames use lighter hues for earlier time steps and darker hues for later ones.
  • Figure 2: An overview of PhysCtrl. Given a single image, we first lift the object in that image into 3D points. We then generate physics-grounded motion trajectories conditioned on physics parameters and external force with a diffusion model, which are then used as strong physics-grounded guidance for image-to-video generation.
  • Figure 3: Our trajectory generation architecture which consists of spatial attention and temporal attention in each block.
  • Figure 4: Qualitative comparison between our method and existing video generation methods.
  • Figure 5: PhysCtrl generates videos of the same object under different physics parameters and forces.
  • ...and 4 more figures