Table of Contents
Fetching ...

Infinite-Resolution Integral Noise Warping for Diffusion Models

Yitong Deng, Winnie Lin, Lingxiao Li, Dmitriy Smirnov, Ryan Burgert, Ning Yu, Vincent Dedun, Mohammad H. Taghavi

TL;DR

This work develops an alternative algorithm that, by gathering increments of multiple Brownian bridges, achieves their infinite-resolution accuracy while simultaneously reducing the computational cost by orders of magnitude.

Abstract

Adapting pretrained image-based diffusion models to generate temporally consistent videos has become an impactful generative modeling research direction. Training-free noise-space manipulation has proven to be an effective technique, where the challenge is to preserve the Gaussian white noise distribution while adding in temporal consistency. Recently, Chang et al. (2024) formulated this problem using an integral noise representation with distribution-preserving guarantees, and proposed an upsampling-based algorithm to compute it. However, while their mathematical formulation is advantageous, the algorithm incurs a high computational cost. Through analyzing the limiting-case behavior of their algorithm as the upsampling resolution goes to infinity, we develop an alternative algorithm that, by gathering increments of multiple Brownian bridges, achieves their infinite-resolution accuracy while simultaneously reducing the computational cost by orders of magnitude. We prove and experimentally validate our theoretical claims, and demonstrate our method's effectiveness in real-world applications. We further show that our method readily extends to the 3-dimensional space.

Infinite-Resolution Integral Noise Warping for Diffusion Models

TL;DR

This work develops an alternative algorithm that, by gathering increments of multiple Brownian bridges, achieves their infinite-resolution accuracy while simultaneously reducing the computational cost by orders of magnitude.

Abstract

Adapting pretrained image-based diffusion models to generate temporally consistent videos has become an impactful generative modeling research direction. Training-free noise-space manipulation has proven to be an effective technique, where the challenge is to preserve the Gaussian white noise distribution while adding in temporal consistency. Recently, Chang et al. (2024) formulated this problem using an integral noise representation with distribution-preserving guarantees, and proposed an upsampling-based algorithm to compute it. However, while their mathematical formulation is advantageous, the algorithm incurs a high computational cost. Through analyzing the limiting-case behavior of their algorithm as the upsampling resolution goes to infinity, we develop an alternative algorithm that, by gathering increments of multiple Brownian bridges, achieves their infinite-resolution accuracy while simultaneously reducing the computational cost by orders of magnitude. We prove and experimentally validate our theoretical claims, and demonstrate our method's effectiveness in real-world applications. We further show that our method readily extends to the 3-dimensional space.

Paper Structure

This paper contains 10 sections, 1 theorem, 10 equations, 14 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Let $\{Z_n\}$ be a sequence of i.i.d. random variables with finite variance that are normalized such that $\mathbb{E}[Z_n] = 0$ and $\mathrm{Var}(Z_n) = 1$. For $c \in \mathbb{R}$, define Consider the sequence of random continuous functions $\{H_n(t)\} \subset C[0, 1]$ defined as Then the sequence $\{H_n\}$ converges in distribution under the sup-norm metric on $C[0,1]$ to $B_c(t) \coloneq W(t)

Figures (14)

  • Figure 1: When the image grid deforms, the Lagrangian view tracks a deformed pixel region, while the Eulerian view tracks the undeformed pixel square as it gets partitioned into multiple regions. On the right, we leverage the exchangeability of upsampled subpixels to convert the Lagrangian gathering procedure into scattering noise subpixels to overlapped deformed pixel regions.
  • Figure 2: Connection between Eulerian noise-warping and increments of a Brownian bridge for a fixed prior noise pixel $[I_W]_{i,j}$. The overlapping area of each colored warped region becomes the time increment for the Brownian bridge. Hence, sampling the Brownian bridge at these times and taking consecutive differences yields integral noise that is scattered to form each warped noise pixel.
  • Figure 3: The grid-based variant (left) computes the overlapping areas by explicitly constructing the polygon for the deformed pixel region. The particle-based variant (middle) approximates these areas with a weighting kernel. With degenerate maps (right), the fixed topology of the grid-based variant can lead to problems, while the connectivity-free, particle-based variant remains stable.
  • Figure 4: Preservation of Gaussian white noise achieved by different warping methods. We report scores and p-values for both Moran's $I$ (spatial correlation) and K-S test (normality). We show that results from our method (both variants) and HIWYN are indistinguishable from white Gaussian noise, while generic warping methods lead to corrupted noise.
  • Figure 5: Convergence of HIWYN to our method as $N$ increases. Top left: experimental setup with prior noise and deformation map. Top middle: 2-Wasserstein distance $W^N$ between the output of HIWYN and ours. Top right: statistics table. Bottom: $W^N$ difference image between the output of HIWYN and ours as $N$ increases. Notice $W^N$ becomes statistically insignificant for $N \ge 64$.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Theorem 1: Scaling limit to Brownian bridge
  • proof