AnimateAnything: Consistent and Controllable Animation for Video Generation
Guojun Lei, Chi Wang, Hong Li, Rong Zhang, Yikai Wang, Weiwei Xu
TL;DR
This work tackles controllable video generation under multiple, diverse signals by unifying all controls into frame-by-frame optical flow. It introduces a two-stage diffusion-based pipeline: Stage 1 converts camera motion, drag annotations, and references into a single dense optical flow; Stage 2 leverages this flow as conditioning for high-quality video synthesis, reinforced by a frequency-domain stabilization module. The approach demonstrates superior performance over state-of-the-art methods on image-to-video and I2V tasks, while ablations validate the necessity of unified flow and spectral stabilization. The method offers robust, precise, and consistent video generation with broad applicability to film and virtual reality settings.
Abstract
We present a unified controllable video generation approach AnimateAnything that facilitates precise and consistent video manipulation across various conditions, including camera trajectories, text prompts, and user motion annotations. Specifically, we carefully design a multi-scale control feature fusion network to construct a common motion representation for different conditions. It explicitly converts all control information into frame-by-frame optical flows. Then we incorporate the optical flows as motion priors to guide final video generation. In addition, to reduce the flickering issues caused by large-scale motion, we propose a frequency-based stabilization module. It can enhance temporal coherence by ensuring the video's frequency domain consistency. Experiments demonstrate that our method outperforms the state-of-the-art approaches. For more details and videos, please refer to the webpage: https://yu-shaonian.github.io/Animate_Anything/.
