RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe, James M. Rehg, Pinar Yanardag
TL;DR
RAVE tackles zero-shot video editing by leveraging pre-trained text-to-image diffusion models without additional training. It introduces a grid-based editing scheme and a novel noise shuffling strategy to enforce strong spatio-temporal interactions, enabling fast and temporally consistent edits for longer videos. The approach is conditioned with ControlNet and uses DDIM inversion to bridge image edits to video, achieving superior temporal coherence and textual alignment on a diverse 186-video dataset. The authors provide extensive qualitative, quantitative, and user-study evidence, show favorable runtime, and release code and data to facilitate reproducibility.
Abstract
Recent advancements in diffusion-based models have demonstrated significant success in generating images from text. However, video editing models have not yet reached the same level of visual quality and user control. To address this, we introduce RAVE, a zero-shot video editing method that leverages pre-trained text-to-image diffusion models without additional training. RAVE takes an input video and a text prompt to produce high-quality videos while preserving the original motion and semantic structure. It employs a novel noise shuffling strategy, leveraging spatio-temporal interactions between frames, to produce temporally consistent videos faster than existing methods. It is also efficient in terms of memory requirements, allowing it to handle longer videos. RAVE is capable of a wide range of edits, from local attribute modifications to shape transformations. In order to demonstrate the versatility of RAVE, we create a comprehensive video evaluation dataset ranging from object-focused scenes to complex human activities like dancing and typing, and dynamic scenes featuring swimming fish and boats. Our qualitative and quantitative experiments highlight the effectiveness of RAVE in diverse video editing scenarios compared to existing methods. Our code, dataset and videos can be found in https://rave-video.github.io.
