How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models
Pascal Chang, Jingwei Tang, Markus Gross, Vinicius C. Azevedo
TL;DR
This work addresses temporal incoherence in diffusion-based video editing by introducing an $\int$-noise representation, which treats each pixel as the integral of an underlying high-resolution white noise field, and a distribution-preserving noise transport that warps noise across frames while maintaining its Gaussian properties. The authors derive a noise transport equation and provide a discrete, practical implementation that yields temporally correlated yet distribution-preserving noise samples, enabling more coherent video editing, restoration, and generation. Across multiple diffusion-based tasks, the $\int$-noise prior improves temporal coherence and reduces artifacts compared with fixed or random noise, while maintaining competitive image quality; however, it incurs higher computational cost and its benefits depend on the diffusion pipeline and data. The approach offers a principled framework for moving noise through time in diffusion models and suggests further exploration in latent-diffusion settings and video-input training.
Abstract
Video editing and generation methods often rely on pre-trained image-based diffusion models. During the diffusion process, however, the reliance on rudimentary noise sampling techniques that do not preserve correlations present in subsequent frames of a video is detrimental to the quality of the results. This either produces high-frequency flickering, or texture-sticking artifacts that are not amenable to post-processing. With this in mind, we propose a novel method for preserving temporal correlations in a sequence of noise samples. This approach is materialized by a novel noise representation, dubbed $\int$-noise (integral noise), that reinterprets individual noise samples as a continuously integrated noise field: pixel values do not represent discrete values, but are rather the integral of an underlying infinite-resolution noise over the pixel area. Additionally, we propose a carefully tailored transport method that uses $\int$-noise to accurately advect noise samples over a sequence of frames, maximizing the correlation between different frames while also preserving the noise properties. Our results demonstrate that the proposed $\int$-noise can be used for a variety of tasks, such as video restoration, surrogate rendering, and conditional video generation. See https://warpyournoise.github.io/ for video results.
