Table of Contents
Fetching ...

Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

Nick Dodson, Xinyu Gao, Qingsong Wang, Yusu Wang, Zhengchao Wan

TL;DR

This work introduces a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which it is argued are two fundamental objects governing memorization and generalization in diffusion models.

Abstract

Diffusion models generate high-quality samples but can also memorize training data, raising serious privacy concerns. Understanding the mechanisms governing when memorization versus generalization occurs remains an active area of research. In particular, it is unclear where along the noise schedule memorization is induced, how data geometry influences it, and how phenomena at different noise scales interact. We introduce a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which we argue are two fundamental objects governing memorization and generalization in diffusion models. This perspective reveals that memorization risk is highly non-uniform across noise levels. We further identify a danger zone at medium noise levels where memorization is most pronounced. In contrast, both the small and large noise regimes resist memorization, but through fundamentally different mechanisms: small noise avoids memorization due to limited training coverage, while large noise exhibits low posterior concentration and admits a provably near linear Gaussian denoising behavior. For the medium noise regime, we identify geometric conditions through which we propose a geometry-informed targeted intervention that mitigates memorization.

Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

TL;DR

This work introduces a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which it is argued are two fundamental objects governing memorization and generalization in diffusion models.

Abstract

Diffusion models generate high-quality samples but can also memorize training data, raising serious privacy concerns. Understanding the mechanisms governing when memorization versus generalization occurs remains an active area of research. In particular, it is unclear where along the noise schedule memorization is induced, how data geometry influences it, and how phenomena at different noise scales interact. We introduce a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which we argue are two fundamental objects governing memorization and generalization in diffusion models. This perspective reveals that memorization risk is highly non-uniform across noise levels. We further identify a danger zone at medium noise levels where memorization is most pronounced. In contrast, both the small and large noise regimes resist memorization, but through fundamentally different mechanisms: small noise avoids memorization due to limited training coverage, while large noise exhibits low posterior concentration and admits a provably near linear Gaussian denoising behavior. For the medium noise regime, we identify geometric conditions through which we propose a geometry-informed targeted intervention that mitigates memorization.
Paper Structure (41 sections, 21 theorems, 180 equations, 23 figures, 2 tables)

This paper contains 41 sections, 21 theorems, 180 equations, 23 figures, 2 tables.

Key Result

Theorem 4.2

Let $x_1 \in \mathcal{D}$ and condition on $\bm{X} = x_1$. Let $w_1(\bm{X}_\sigma,\sigma)$ denote the posterior weight on $x_1$ at a noise level $\sigma$. Then for any $\delta \in (0,1)$ and $q\in (\frac{1}{2},1)$, with probability at least $1-\delta$, we have that: where the constants $a_{K,\delta,q}$ and $b_{\delta,q}$ are defined by and where $F$ is the CDF of the standard normal distribution

Figures (23)

  • Figure 1: MSE to Clean Image. Comparison of denoising quality across noise levels. Solid lines: training data; dotted lines: test data. EDM-1K shows a generalization gap in the mid-$\sigma$ region.
  • Figure 2: Per-Noise-Level Memorization Rate. Fraction of denoised test images classified as memorized at each noise level.
  • Figure 3: Gaussian shell coverage and max posterior weight on CIFAR-10. We plot empirical estimates of the Gaussian shell coverage $C_\sigma(p,\mathcal{D})$ and the max posterior weight $W_\sigma(\mathcal{D})$ as functions of the noise level $\sigma$, using a $1\mathrm{k}$ CIFAR-10 training subset and $1\mathrm{k}$ held-out test images.
  • Figure 4: Comparison of anti-memorization methods. Training curves showing FID score (solid lines) and memorization ratio (dashed lines) for three methods: 2k-Baseline, 2k-Dummy, and 2k-Gap. Gap training achieves the lowest memorization (0.7%) and best FID (2.35) at final checkpoint.
  • Figure 5: Danger zone and gap configurations. The intersection of coverage and max weight curves identifies the danger zone. Memorization rates for different gap configurations are overlaid, showing that gaps targeting this region minimize memorization.
  • ...and 18 more figures

Theorems & Definitions (44)

  • Definition 4.1
  • Theorem 4.2
  • Remark 4.3
  • Remark 4.4
  • Lemma 4.5: Gaussian concentration laurent_massart_2000
  • Definition 4.6: Gaussian Shell Coverage
  • Definition 4.7
  • Theorem 4.8
  • Theorem 4.9: Informal theorem
  • Theorem 4.10: Informal theorem
  • ...and 34 more