Table of Contents
Fetching ...

On the Relation Between Linear Diffusion and Power Iteration

Dana Weitzner, Mauricio Delbracio, Peyman Milanfar, Raja Giryes

TL;DR

This work examines the generation process as a ``correlation machine'', where random noise is repeatedly enhanced in correlation with the implicit given distribution, and explores the linear case, where the optimal denoiser in the MSE sense is known to be the PCA projection.

Abstract

Recently, diffusion models have gained popularity due to their impressive generative abilities. These models learn the implicit distribution given by the training dataset, and sample new data by transforming random noise through the reverse process, which can be thought of as gradual denoising. In this work, we examine the generation process as a ``correlation machine'', where random noise is repeatedly enhanced in correlation with the implicit given distribution. To this end, we explore the linear case, where the optimal denoiser in the MSE sense is known to be the PCA projection. This enables us to connect the theory of diffusion models to the spiked covariance model, where the dependence of the denoiser on the noise level and the amount of training data can be expressed analytically, in the rank-1 case. In a series of numerical experiments, we extend this result to general low rank data, and show that low frequencies emerge earlier in the generation process, where the denoising basis vectors are more aligned to the true data with a rate depending on their eigenvalues. This model allows us to show that the linear diffusion model converges in mean to the leading eigenvector of the underlying data, similarly to the prevalent power iteration method. Finally, we empirically demonstrate the applicability of our findings beyond the linear case, in the Jacobians of a deep, non-linear denoiser, used in general image generation tasks.

On the Relation Between Linear Diffusion and Power Iteration

TL;DR

This work examines the generation process as a ``correlation machine'', where random noise is repeatedly enhanced in correlation with the implicit given distribution, and explores the linear case, where the optimal denoiser in the MSE sense is known to be the PCA projection.

Abstract

Recently, diffusion models have gained popularity due to their impressive generative abilities. These models learn the implicit distribution given by the training dataset, and sample new data by transforming random noise through the reverse process, which can be thought of as gradual denoising. In this work, we examine the generation process as a ``correlation machine'', where random noise is repeatedly enhanced in correlation with the implicit given distribution. To this end, we explore the linear case, where the optimal denoiser in the MSE sense is known to be the PCA projection. This enables us to connect the theory of diffusion models to the spiked covariance model, where the dependence of the denoiser on the noise level and the amount of training data can be expressed analytically, in the rank-1 case. In a series of numerical experiments, we extend this result to general low rank data, and show that low frequencies emerge earlier in the generation process, where the denoising basis vectors are more aligned to the true data with a rate depending on their eigenvalues. This model allows us to show that the linear diffusion model converges in mean to the leading eigenvector of the underlying data, similarly to the prevalent power iteration method. Finally, we empirically demonstrate the applicability of our findings beyond the linear case, in the Jacobians of a deep, non-linear denoiser, used in general image generation tasks.

Paper Structure

This paper contains 10 sections, 1 theorem, 26 equations, 7 figures.

Key Result

Theorem 4.3

Let $\sigma_t = \frac{1}{T}$, $t = 0, \dots, T$. Assuming assumption:cross_products, assumption:diagonal_elements, in the limit $T \to \infty$,

Figures (7)

  • Figure 1: Digit generation from pure noise (class conditioned). The reverse process runs from left to right, top then bottom.
  • Figure 2: The sine of the angle between the clean principal components and their noisy versions, colored by the order of the eigenvalues (the darkest being largest eigenvalue). Low frequencies emerge earlier in the generation process (at higher noise levels). This motivates Assumption \ref{['assumption:diagonal_elements']}, that extends Equation \ref{['eq:sin_theta_pca_0']} to higher ranks.
  • Figure 3: Effect of dataset size. The plots show $\sin{\theta_\text{PCA}}$ at different noise levels when trained on datasets with increasing size (lighter color). Each plot is of a different component index, for indices $0, 5, 10$ (left to right; index $0$ corresponds to the largest eigenvalue). Increasing the amount of training data improves the robustness to noise, and allows the appearance of high frequencies at higher noise levels, hence capturing more data nuances in the generated data and better generalization.
  • Figure 4: Schematic illustration of the basis perturbation, per index.
  • Figure 5: The time point basis correlation matrices $U_\tau^T U_{\tau+1}$ (left per pair), together with the partial product $\Pi_{t=0}^\tau (U_t^\dag U_{t+1})$ (right per pair) at different time points. This justifies Assumption \ref{['assumption:cross_products']}, and shows that the total projection (bottom right image, for $\tau = T$) converges to the first eigenvector, similarly to the power method.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 4.3: Convergence to Power Iteration
  • proof