LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation
Quanjian Song, Zhihang Lin, Zhanpeng Zeng, Ziyue Zhang, Liujuan Cao, Rongrong Ji
TL;DR
The paper tackles the heavy computational burden of camera-motion conditioned video generation by introducing LightMotion, a tuning-free method operating entirely in latent space. It breakup the problem into three components: latent space permutation to simulate panning/zooming/rotation, background-aware latent-space resampling with cross-frame alignment to fill new perspectives coherently, and latent space correction to mitigate SNR shifts during updates. Through exhaustive experiments against multiple baselines, LightMotion demonstrates superior quantitative metrics (FVD, CLIP-F, CLIP-T) and qualitative coherence, while supporting diverse user-defined motion parameters. The approach enables end-to-end, accessible camera-motion video generation without fine-tuning or depth estimation, with practical implications for film, VR, and content creation pipelines.
Abstract
Existing camera motion-controlled video generation methods face computational bottlenecks in fine-tuning and inference. This paper proposes LightMotion, a light and tuning-free method for simulating camera motion in video generation. Operating in the latent space, it eliminates additional fine-tuning, inpainting, and depth estimation, making it more streamlined than existing methods. The endeavors of this paper comprise: (i) The latent space permutation operation effectively simulates various camera motions like panning, zooming, and rotation. (ii) The latent space resampling strategy combines background-aware sampling and cross-frame alignment to accurately fill new perspectives while maintaining coherence across frames. (iii) Our in-depth analysis shows that the permutation and resampling cause an SNR shift in latent space, leading to poor-quality generation. To address this, we propose latent space correction, which reintroduces noise during denoising to mitigate SNR shift and enhance video generation quality. Exhaustive experiments show that our LightMotion outperforms existing methods, both quantitatively and qualitatively.
