Table of Contents
Fetching ...

Dynamical Regimes of Diffusion Models

Giulio Biroli, Tony Bonnaire, Valentin de Bortoli, Marc Mézard

TL;DR

Using statistical physics methods, this work identifies three distinct dynamical regimes during the generative diffusion process, which are supported by analytical solutions for Gaussian mixtures and confirmed by numerical experiments on real datasets.

Abstract

Using statistical physics methods, we study generative diffusion models in the regime where the dimension of space and the number of data are large, and the score function has been trained optimally. Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process. The generative dynamics, starting from pure noise, encounters first a 'speciation' transition where the gross structure of data is unraveled, through a mechanism similar to symmetry breaking in phase transitions. It is followed at later time by a 'collapse' transition where the trajectories of the dynamics become attracted to one of the memorized data points, through a mechanism which is similar to the condensation in a glass phase. For any dataset, the speciation time can be found from a spectral analysis of the correlation matrix, and the collapse time can be found from the estimation of an 'excess entropy' in the data. The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models. Analytical solutions for simple models like high-dimensional Gaussian mixtures substantiate these findings and provide a theoretical framework, while extensions to more complex scenarios and numerical validations with real datasets confirm the theoretical predictions.

Dynamical Regimes of Diffusion Models

TL;DR

Using statistical physics methods, this work identifies three distinct dynamical regimes during the generative diffusion process, which are supported by analytical solutions for Gaussian mixtures and confirmed by numerical experiments on real datasets.

Abstract

Using statistical physics methods, we study generative diffusion models in the regime where the dimension of space and the number of data are large, and the score function has been trained optimally. Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process. The generative dynamics, starting from pure noise, encounters first a 'speciation' transition where the gross structure of data is unraveled, through a mechanism similar to symmetry breaking in phase transitions. It is followed at later time by a 'collapse' transition where the trajectories of the dynamics become attracted to one of the memorized data points, through a mechanism which is similar to the condensation in a glass phase. For any dataset, the speciation time can be found from a spectral analysis of the correlation matrix, and the collapse time can be found from the estimation of an 'excess entropy' in the data. The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models. Analytical solutions for simple models like high-dimensional Gaussian mixtures substantiate these findings and provide a theoretical framework, while extensions to more complex scenarios and numerical validations with real datasets confirm the theoretical predictions.
Paper Structure (14 sections, 62 equations, 8 figures, 1 table)

This paper contains 14 sections, 62 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Illustration of the three regimes of the backward dynamics though an example corresponding to a Gaussian mixture in two dimensions. Trajectories are colored white and blue according to their class at the end of the backward dynamics. In regime I, blue and white trajectories are fluctuating within the same bundle and $\vec{x}$ is similar to white noise. At the speciation time $t_S$, the ensembles of blue and white trajectories divide and head towards the distribution associated to their respective class. Regime II is where the generative process constructs an $\vec{x}$ which resembles to one element of the class (e.g., a seashore in the illustration) without being linked to any data of the training set. At the collapse time $t_C$, trajectories start to be attracted by the data point on which they collapse at $t=0$. Regime III corresponds to memorization, whereas in regime I and II, the diffusion model truly generalizes. The images on the right and on the left are illustrations obtained from our ImageNet numerical experiment (notice the collapse on the panda and seashore from the training set at $t=0$).
  • Figure 2: Speciation in Gaussian mixtures: Evolution of $\phi(t)$ as a function of $t/t_S$ for several values of $d$ at fixed $\tilde{\mu} = 1$ and $\sigma = 1$. The solid line corresponds to the evaluation of \ref{['phi_GM']} while the dots are obtained by sampling $10\,000$ clone trajectories. The vertical (resp. horizontal) dashed line corresponds to $t/t_S = 1$ (resp. $\phi(t) = 0.775$). Error bars correspond to thrice standard error.
  • Figure 3: Collapse in Gaussian mixtures: Evolution of the excess entropy density $f^e(t)/\alpha$ as a function of time $t$ for several values of $d$, at fixed $n=20\,000$. The solid lines are the theoretical predictions while the dots show the results of the numerical evaluation approximating the entropy from the dataset. The vertical dashed lines represent the collapse time $t_C$ predicted analytically for Gaussian mixtures given in \ref{['eq:tc_GM']}. Error bars correspond to thrice the standard error.
  • Figure 4: Speciation in realistic datasets: Evolution of $\phi(t)$, the probability that the two clones end up in the same class, as a function of $t/t_S$ for several image datasets. The values of $t_S$ are the theoretical prediction for the speciation time obtained using \ref{['crit-spec']} and listed in Table \ref{['tab:datasets']}. The dashed horizontal line indicate $\phi(t) = 0.775$ and the errorbars correspond to thrice the standard error.
  • Figure 5: Collapse in realistic datasets (ImageNet16, ImageNet32 and LSUN): (Top Left) Evolution of $\phi_C(t)$, the probability that two cloned trajectories collapse on the same data of the training set at time zero. (Top Right) Histograms of $\hat{t}_\mathrm{c}$ derived from the last-changing indices $\mu_\star$ on $4\,000$ generated samples for the LSUN dataset trained with $n=200$. (Bottom) Evolution of the empirical excess entropy $f(t)/\alpha$. In all panels, the colored vertical dashed lines indicate the average of $\hat{t}_C$. The errorbars correspond to thrice the standard error.
  • ...and 3 more figures