Table of Contents
Fetching ...

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

Giannis Daras, Alexandros G. Dimakis, Constantinos Daskalakis

TL;DR

This work provides the first exact framework for training diffusion models to sample from an uncorrupted distribution using only noisy data. It combines a double Tweedie’s formula-based Ambient Denoising Score Matching for $\sigma_t>\sigma_{t_n}$ with a Consistency Loss to extend to $\sigma_t\, ext{le}\,\sigma_{t_n}$, enabling end-to-end sampling from corrupted data. Empirically, the authors show that training with corrupted data plus consistency yields competitive performance and, importantly, reduces memorization of training data in finetuned SDXL while enabling effective fine-tuning on diverse datasets. They also provide evidence that diffusion models memorize more than previously thought and discuss copyright/privacy implications, offering a practical path toward memorization mitigation via their exact training framework.

Abstract

Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in this space. Our key technical contribution is a method that uses a double application of Tweedie's formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

TL;DR

This work provides the first exact framework for training diffusion models to sample from an uncorrupted distribution using only noisy data. It combines a double Tweedie’s formula-based Ambient Denoising Score Matching for with a Consistency Loss to extend to , enabling end-to-end sampling from corrupted data. Empirically, the authors show that training with corrupted data plus consistency yields competitive performance and, importantly, reduces memorization of training data in finetuned SDXL while enabling effective fine-tuning on diverse datasets. They also provide evidence that diffusion models memorize more than previously thought and discuss copyright/privacy implications, offering a practical path toward memorization mitigation via their exact training framework.

Abstract

Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in this space. Our key technical contribution is a method that uses a double application of Tweedie's formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.
Paper Structure (27 sections, 9 theorems, 43 equations, 11 figures, 1 table)

This paper contains 27 sections, 9 theorems, 43 equations, 11 figures, 1 table.

Key Result

Theorem 3.1

Define ${\bm{X}}_t$ as in the beginning of Section sec:background. Suppose we are given samples ${\bm{X}}_{t_n} = {\bm{X}}_0 + \sigma_{t_n}{\bm{Z}}$, where ${\bm{X}}_0 \sim p_0$ and ${\bm{Z}} \sim \mathcal{N}(\bm{0}, I)$. Consider the following objective: where $\bm{\eta}$ in the above is a standard Gaussian vector. Suppose that the family of functions $\{{\bm{h}}_\theta\}$ is rich enough to cont

Figures (11)

  • Figure 1: Top row: images from LAION schuhmann2022laion, middle row: masked images, bottom row: reconstructed images with the SDXL podell2023sdxl inpainting model. The accuracy of the reconstructions presents strong evidence that the images on the top-row were in the training set of SDXL (or SDXL Inpainting) and have been memorized. To the best of our knowledge, SDXL does not disclose its training set.
  • Figure 2: SDXL podell2023sdxl posterior samples (Row e) given extremely noisy encodings (Row c) of LAION images (Row a). The level of fidelity of the reconstructions to the original images, despite the severe corruption (c) and the blurriness of the MMSE solution (d), indicates that the images were potentially in the training set and have been memorized.
  • Figure 3: Distribution of image similarities of generated images with their nearest neighbors in the dataset for: i) the somepalli2022diffusion method, and ii) for our noising method for two different noise levels. As shown, the fraction of images with similarities above 0.95 (near-identical to training set) is much higher for our method compared to the somepalli2022diffusion baseline.
  • Figure 4: FID results for SDXL finetuned models, with and without consistency, on FFHQ, as we change the corruption level. The performance of models trained without consistency deteriorates significantly as we increase the corruption. Models trained with consistency maintain comparable performance to the baseline model (trained on clean data) for noise levels up to $t_n=500$.
  • Figure 5: Visualization of the noise levels considered in the paper. The top row shows noisy latents, visualized as RGB images. The bottom row shows posterior samples obtained by the SDXL podell2023sdxl model given these noise latents.
  • ...and 6 more figures

Theorems & Definitions (15)

  • Theorem 3.1: Ambient Denoising Score Matching
  • Lemma 3.2: Generalized Tweedie's Formula
  • Definition 3.3: Consistent Denoiser daras2023consistent
  • Theorem 3.4: Main Theorem (informal)
  • Lemma 1.1: Generalized Tweedie's Formula
  • proof
  • Lemma 1.2: Connecting Conditional Expectations -- Variance Exploding
  • proof
  • Theorem 1.3: Ambient Denoising Score Matching; restated Theorem \ref{['th:sure_alt']}
  • proof
  • ...and 5 more