CamCloneMaster: Enabling Reference-based Camera Control for Video Generation
Yawen Luo, Jianhong Bai, Xiaoyu Shi, Menghan Xia, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Tianfan Xue
TL;DR
CamCloneMaster introduces a reference-based, parameter-free approach to cloning camera motion from reference videos, unifying image-to-video and video-to-video generation within a single diffusion-based model. It uses a simple token-concatenation mechanism to inject camera-motion and content conditioning directly into the latent diffusion process, and finetunes only 3D spatio-temporal attention layers to preserve generative capabilities. A large Unreal Engine 5–based Camera Clone Dataset supports learning diverse camera trajectories and dynamic scenes, enabling state-of-the-art performance on both I2V and V2V tasks as demonstrated by quantitative metrics and user studies. The work offers a practical, intuitive tool for cinematographers and content creators, with the dataset and method facilitating future research in camera-controlled video synthesis.
Abstract
Camera control is crucial for generating expressive and cinematic videos. Existing methods rely on explicit sequences of camera parameters as control conditions, which can be cumbersome for users to construct, particularly for intricate camera movements. To provide a more intuitive camera control method, we propose CamCloneMaster, a framework that enables users to replicate camera movements from reference videos without requiring camera parameters or test-time fine-tuning. CamCloneMaster seamlessly supports reference-based camera control for both Image-to-Video and Video-to-Video tasks within a unified framework. Furthermore, we present the Camera Clone Dataset, a large-scale synthetic dataset designed for camera clone learning, encompassing diverse scenes, subjects, and camera movements. Extensive experiments and user studies demonstrate that CamCloneMaster outperforms existing methods in terms of both camera controllability and visual quality.
