Table of Contents
Fetching ...

Losing dimensions: Geometric memorization in generative diffusion

Beatrice Achilli, Enrico Ventura, Gianluigi Silvestri, Bao Pham, Gabriel Raya, Dmitry Krotov, Carlo Lucibello, Luca Ambrogioni

TL;DR

The theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions.

Abstract

Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical physics techniques, we extend the theory of memorization in generative diffusion to manifold-supported data. Our theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions. Perhaps counterintuitively, we find that, under some conditions, subspaces of higher variance are lost first due to memorization effects. This leads to a selective loss of dimensionality where some prominent features of the data are memorized without a full collapse on any individual training point. We validate our theory with a comprehensive set of experiments on networks trained both in image datasets and on linear manifolds, which result in a remarkable qualitative agreement with the theoretical predictions.

Losing dimensions: Geometric memorization in generative diffusion

TL;DR

The theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions.

Abstract

Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical physics techniques, we extend the theory of memorization in generative diffusion to manifold-supported data. Our theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions. Perhaps counterintuitively, we find that, under some conditions, subspaces of higher variance are lost first due to memorization effects. This leads to a selective loss of dimensionality where some prominent features of the data are memorized without a full collapse on any individual training point. We validate our theory with a comprehensive set of experiments on networks trained both in image datasets and on linear manifolds, which result in a remarkable qualitative agreement with the theoretical predictions.

Paper Structure

This paper contains 22 sections, 41 equations, 16 figures, 1 table, 3 algorithms.

Figures (16)

  • Figure 1: Visualization of the latent manifold of a diffusion model. The contour lines denote the log-probability (i.e. the (negative) 'energy'). The manifold of fixed points is drawn as a red line. A) Manifolds corresponding to memorization and one-dimensional generalization. B) Tangent and orthogonal singular vectors of the score.
  • Figure 2: Illustration of the gaps in the singular values of the Jacobian of the score function in the presence of a latent manifold. The small values correspond to the tangent manifold (possible generations) while the high values correspond to the orthogonal manifold (forbidden generations). The singular values determine the steepness of the potential well along each eigen-direction.
  • Figure 3: Visualization of the dimensionality loss phenomenon. Manifold sub-spaces with higher variance are lost due to 'condensation' (i.e. memorization). Panels A,B and C show the score estimated from a bivariate distribution with unequal variances for $\beta = 1$, $\beta = 10$ and $\beta = 100$ respectively. The red arrows show the empirical score while the heat-map visualizes the density.
  • Figure 4: The ordered singular values of the Jacobian of the empirical score function of a linear manifold model as a function of the diffusion time $t$. Lighter colours are associated to larger times in the colour map. The parameters for the model are $d = 30$, $m = 7$, $\log(N)/d = 0.23$ with subspaces associated to variances $\sigma_1^2 = 1$ and $\sigma_2^2 = 0.3$ with dimensions $m_1=2$ and $m_2 = 5$ respectively. Left: approximated theoretical prediction in the memorization phase according to Eq. \ref{['eq:sibar']}. Center: prediction from the approximated Jacobian in Eq. \ref{['eq:empirical jacobian']}. Right: singular values obtained by the numerical measure of the Jacobian of the empirical score function (as described in Supp. \ref{['supp:method']}), evaluated from a synthetic data set of $N = 10^3$ points.
  • Figure 5: Spectra for different $t$ estimated from diffusion networks trained on linear Normal data with two subspaces with different variance using a range of dataset sizes. The red (green) dashed line correspond to the location of the theoretical spectral gap for the high (low) variance subspace. The black dashed line corresponds to the total manifold gap.
  • ...and 11 more figures