Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastava
TL;DR
This work tackles one-shot video motion customization for text-to-video diffusion models by learning a motion signature from a single reference video. It introduces Temporal LoRA to model temporal dynamics and Appearance Absorbers to disentangle spatial appearance from motion, employing a staged training/inference pipeline for robust, diverse motion transfer to new subjects and scenes. Experiments show faithful motion reproduction and rich variation, outperforming baselines and concurrent methods on both quantitative metrics and human judgments. The framework supports downstream tasks such as video appearance customization, multiple motion combination, and reuse of third-party absorbers, offering a plug-and-play approach to motion-aware video generation and editing.
Abstract
Image customization has been extensively studied in text-to-image (T2I) diffusion models, leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customization, has not yet been well investigated. To address the challenge of one-shot video motion customization, we propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties. It leverages low-rank adaptation (LoRA) on temporal attention layers to tailor the pre-trained T2V diffusion model for specific motion modeling. To disentangle the spatial and temporal information during training, we introduce a novel concept of appearance absorbers that detach the original appearance from the reference video prior to motion learning. The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions to various downstream tasks such as custom video generation and editing, video appearance customization and multiple motion combination. Our project page can be found at https://customize-a-video.github.io.
