DreaMoving: A Human Video Generation Framework based on Diffusion Models
Mengyang Feng, Jinlin Liu, Kai Yu, Yuan Yao, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie
TL;DR
DreaMoving addresses the challenge of controllable, identity-preserving human video generation by introducing a diffusion-based framework with a Video ControlNet for motion conditioning and a Content Guider for appearance grounding. It combines motion blocks, long-frame pretraining, and a multi-stage training pipeline (Content Guider training, long-frame pretraining, Video ControlNet training, and expression fine-tuning) to enable pose/depth conditioning and image-based identity guidance. The approach supports text-only, image-only, and mixed prompts, achieving high-quality, temporally consistent videos and generalizing to unseen styles. This work offers a practical pathway for customizable, identity-consistent human video synthesis at scale.
Abstract
In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos. Specifically, given target identity and posture sequences, DreaMoving can generate a video of the target identity moving or dancing anywhere driven by the posture sequences. To this end, we propose a Video ControlNet for motion-controlling and a Content Guider for identity preserving. The proposed model is easy to use and can be adapted to most stylized diffusion models to generate diverse results. The project page is available at https://dreamoving.github.io/dreamoving
