PhysMotion: Physics-Grounded Dynamics From a Single Image

Xiyang Tan; Ying Jiang; Xuan Li; Zeshun Zong; Tianyi Xie; Yin Yang; Chenfanfu Jiang

PhysMotion: Physics-Grounded Dynamics From a Single Image

Xiyang Tan, Ying Jiang, Xuan Li, Zeshun Zong, Tianyi Xie, Yin Yang, Chenfanfu Jiang

TL;DR

PhysMotion addresses single-image video generation by fusing physics-based 3D Gaussian Splatting with Material Point Method dynamics and a diffusion-based video refinement that enforces temporal coherence. The method lifts a single image to a geometry-aware 3D representation, simulates elastoplastic and viscoplastic dynamics under applied forces, and refines the result with cross-frame-attention diffusion to preserve detail. Key contributions include the first single-image workflow with 3D geometry-aware, physics-grounded dynamics and a two-stage video enhancement pipeline validated against baselines with quantitative and qualitative gains in physical plausibility and visual fidelity. This framework enables realistic, controllable dynamics from minimal input, with potential applications in animation, AR/VR, and interactive media.

Abstract

We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions (e.g., applied force and torque), producing high-quality, physically plausible video generation. By utilizing continuum mechanics-based simulations as a prior knowledge, our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions. Our framework begins by reconstructing a feed-forward 3D Gaussian from a single image through geometry optimization. This representation is then time-stepped using a differentiable Material Point Method (MPM) with continuum mechanics-based elastoplasticity models, which provides a strong foundation for realistic dynamics, albeit at a coarse level of detail. To enhance the geometry, appearance and ensure spatiotemporal consistency, we refine the initial simulation using a text-to-image (T2I) diffusion model with cross-frame attention, resulting in a physically plausible video that retains intricate details comparable to the input image. We conduct comprehensive qualitative and quantitative evaluations to validate the efficacy of our method. Our project page is available at: https://supertan0204.github.io/physmotion_website/.

PhysMotion: Physics-Grounded Dynamics From a Single Image

TL;DR

Abstract

PhysMotion: Physics-Grounded Dynamics From a Single Image

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)