Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang
TL;DR
The paper addresses the gap between visually convincing yet physically inconsistent videos and the need for physically faithful video synthesis. It proposes a data-centric approach that leverages CGI-generated videos from Blender and Unreal Engine, coupled with a diffusion-transformer model and the SimDrop technique to suppress synthetic artifacts while preserving physical realism. Through three representative tasks—large human motion, wide-camera rotations, and layer decomposition—the method demonstrates improved 3D consistency, pose integrity, and foreground-background separation, outperforming several baselines and commercial models. This work highlights the practical potential of synthetic video data to enhance physical fidelity in video synthesis and sets the stage for richer supervisory signals and physics-aware training in the future.
Abstract
We investigate how to enhance the physical fidelity of video generation models by leveraging synthetic videos derived from computer graphics pipelines. These rendered videos respect real-world physics, such as maintaining 3D consistency, and serve as a valuable resource that can potentially improve video generation models. To harness this potential, we propose a solution that curates and integrates synthetic data while introducing a method to transfer its physical realism to the model, significantly reducing unwanted artifacts. Through experiments on three representative tasks emphasizing physical consistency, we demonstrate its efficacy in enhancing physical fidelity. While our model still lacks a deep understanding of physics, our work offers one of the first empirical demonstrations that synthetic video enhances physical fidelity in video synthesis. Website: https://kevinz8866.github.io/simulation/
