WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

Zizhang Li; Hong-Xing Yu; Wei Liu; Yin Yang; Charles Herrmann; Gordon Wetzstein; Jiajun Wu

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

Zizhang Li, Hong-Xing Yu, Wei Liu, Yin Yang, Charles Herrmann, Gordon Wetzstein, Jiajun Wu

TL;DR

WonderPlay tackles action-conditioned dynamic 3D scene generation from a single image by marrying physics-based solvers with diffusion-based video generation. It introduces a hybrid generative simulator that produces coarse dynamics with physics, refines motion and appearance via a bimodal video generator conditioned on flow and the input image, and updates the scene through differentiable rendering. The approach supports diverse materials (rigid, elastic, cloth, liquids, gases, granular) and outperforms purely physics-based and purely video-based baselines on both quantitative metrics and human judgments. This framework enables intuitive user control while achieving high physical plausibility and visual realism, with potential impact on AR/VR, embodied AI, and interactive content creation.

Abstract

WonderPlay is a novel framework integrating physics simulation with video generation for generating action-conditioned dynamic 3D scenes from a single image. While prior works are restricted to rigid body or simple elastic dynamics, WonderPlay features a hybrid generative simulator to synthesize a wide range of 3D dynamics. The hybrid generative simulator first uses a physics solver to simulate coarse 3D dynamics, which subsequently conditions a video generator to produce a video with finer, more realistic motion. The generated video is then used to update the simulated dynamic 3D scene, closing the loop between the physics solver and the video generator. This approach enables intuitive user control to be combined with the accurate dynamics of physics-based simulators and the expressivity of diffusion-based video generators. Experimental results demonstrate that WonderPlay enables users to interact with various scenes of diverse content, including cloth, sand, snow, liquid, smoke, elastic, and rigid bodies -- all using a single image input. Code will be made public. Project website: https://kyleleey.github.io/WonderPlay/

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

TL;DR

Abstract

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)