Table of Contents
Fetching ...

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Kulin Shah, Alkis Kalavasis, Adam R. Klivans, Giannis Daras

TL;DR

This work provides theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details) and proposes a simple, principled method to train the diffusion models using noisy data at large noise scales.

Abstract

There is strong empirical evidence that the state-of-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to a decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

TL;DR

This work provides theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details) and proposes a simple, principled method to train the diffusion models using noisy data at large noise scales.

Abstract

There is strong empirical evidence that the state-of-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to a decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.

Paper Structure

This paper contains 42 sections, 14 theorems, 54 equations, 6 figures, 3 tables.

Key Result

Lemma 4.1

Let $S_{t_{\mathrm{n}}}$ be the noisy training set as in L1 of alg:training_algorithm. For a fixed $S_{t_{\mathrm{n}}}$, let $\widehat{p}_{t_{\mathrm{n}}}$ be the distribution at time $t=t_{\mathrm{n}}$ that arises by using the score of alg:training_algorithm in the reverse process of Eq.eq:determin

Figures (6)

  • Figure 1: (FID, Memorization) pairs for different values of $\sigma_{t_{\mathrm{n}}}$ used in our proposed \ref{['alg:training_algorithm']} (presented in \ref{['section:method']}) for training diffusion models from limited data. The standard DDPM objective corresponds to $\sigma_{t_{\mathrm{n}}} = 0$ and it is not in the Pareto frontier. Setting $\sigma_{t_{\mathrm{n}}}$ too low or too high reverts back to the DDPM behavior. Values for $\sigma_{t_{\mathrm{n}}} \in [0.4, 4]$ strike different balances between memorization and quality of generated images. The models in this Figure are trained on only $300$ images from FFHQ.
  • Figure 2: Qualitative results for reducing the memorization of Stable Diffusion 2. Combining our method with wen2024detecting helps generate novel samples for the above prompts. See \ref{['section:method']} for our method and \ref{['sec:text-conditioned-exp']} for more details on the experiment.
  • Figure 3: Comparison of denoised images under different noise levels and training conditions. Standard diffusion modeling leads to overconfident predictions (row 3) even for very highly noised inputs when it is trained on small datasets. Our algorithm (row 4), has a similar behavior (blurry outputs) to a model trained with significantly more data (row 1), indicating less memorization.
  • Figure 4: Images generated using a model trained with our method on 300 samples
  • Figure 5: Images generated using a model trained with our method on 1000 samples
  • ...and 1 more figures

Theorems & Definitions (24)

  • Lemma 4.1: Ambient Diffusion solution at $t_{\mathrm{n}}$
  • Lemma 4.2: Information Leakage
  • Theorem 4.3: Informal, see \ref{['lemma:mainFeldman']}
  • Lemma 4.4: Informal, see \ref{['lemma:T1']} and Lemma 2.6 in feldman2020does
  • Lemma 4.5: Informal, see \ref{['sec:Noise']}
  • Definition 1: Random Frequencies feldman2020does
  • Theorem A.1
  • Remark 1: Gaussian Mixture Models
  • Lemma A.2: Lemma 2.6 in feldman2020does
  • Lemma A.3: Lemma 2.7 in feldman2020does
  • ...and 14 more