Table of Contents
Fetching ...

Data Unlearning in Diffusion Models

Silas Alberti, Kenan Hasanaliyev, Manav Shah, Stefano Ermon

TL;DR

This work tackles data unlearning in diffusion models, addressing the need to forget specific datapoints without full retraining. It introduces Subtracted Importance Sampled Scores (SISS), a loss that combines naive deletion and NegGrad via a defensive mixture $q_{\lambda}$ and a superfactor $s$, with $\lambda$ typically set to 0.5 to balance unlearning and quality. The authors prove that $\ell_{s,\lambda}(\theta)$ is equivalent in expectation to the naive deletion objective and demonstrate stability through gradient clipping analysis, delivering practical unlearning guarantees. Empirically, SISS achieves Pareto-optimal trade-offs between model quality and unlearning strength across CelebA-HQ, MNIST T-Shirt, and Stable Diffusion, including substantial memorization mitigation on text-conditioned diffusion, while outperforming prior baselines. This approach has implications for privacy and copyright compliance by enabling efficient datapoint forgetting in large generative models, albeit with ongoing considerations about legal sufficiency.

Abstract

Recent work has shown that diffusion models memorize and reproduce training data examples. At the same time, large copyright lawsuits and legislation such as GDPR have highlighted the need for erasing datapoints from diffusion models. However, retraining from scratch is often too expensive. This motivates the setting of data unlearning, i.e., the study of efficient techniques for unlearning specific datapoints from the training set. Existing concept unlearning techniques require an anchor prompt/class/distribution to guide unlearning, which is not available in the data unlearning setting. General-purpose machine unlearning techniques were found to be either unstable or failed to unlearn data. We therefore propose a family of new loss functions called Subtracted Importance Sampled Scores (SISS) that utilize importance sampling and are the first method to unlearn data with theoretical guarantees. SISS is constructed as a weighted combination between simpler objectives that are responsible for preserving model quality and unlearning the targeted datapoints. When evaluated on CelebA-HQ and MNIST, SISS achieved Pareto optimality along the quality and unlearning strength dimensions. On Stable Diffusion, SISS successfully mitigated memorization on nearly 90% of the prompts we tested.

Data Unlearning in Diffusion Models

TL;DR

This work tackles data unlearning in diffusion models, addressing the need to forget specific datapoints without full retraining. It introduces Subtracted Importance Sampled Scores (SISS), a loss that combines naive deletion and NegGrad via a defensive mixture and a superfactor , with typically set to 0.5 to balance unlearning and quality. The authors prove that is equivalent in expectation to the naive deletion objective and demonstrate stability through gradient clipping analysis, delivering practical unlearning guarantees. Empirically, SISS achieves Pareto-optimal trade-offs between model quality and unlearning strength across CelebA-HQ, MNIST T-Shirt, and Stable Diffusion, including substantial memorization mitigation on text-conditioned diffusion, while outperforming prior baselines. This approach has implications for privacy and copyright compliance by enabling efficient datapoint forgetting in large generative models, albeit with ongoing considerations about legal sufficiency.

Abstract

Recent work has shown that diffusion models memorize and reproduce training data examples. At the same time, large copyright lawsuits and legislation such as GDPR have highlighted the need for erasing datapoints from diffusion models. However, retraining from scratch is often too expensive. This motivates the setting of data unlearning, i.e., the study of efficient techniques for unlearning specific datapoints from the training set. Existing concept unlearning techniques require an anchor prompt/class/distribution to guide unlearning, which is not available in the data unlearning setting. General-purpose machine unlearning techniques were found to be either unstable or failed to unlearn data. We therefore propose a family of new loss functions called Subtracted Importance Sampled Scores (SISS) that utilize importance sampling and are the first method to unlearn data with theoretical guarantees. SISS is constructed as a weighted combination between simpler objectives that are responsible for preserving model quality and unlearning the targeted datapoints. When evaluated on CelebA-HQ and MNIST, SISS achieved Pareto optimality along the quality and unlearning strength dimensions. On Stable Diffusion, SISS successfully mitigated memorization on nearly 90% of the prompts we tested.

Paper Structure

This paper contains 21 sections, 1 theorem, 23 equations, 6 figures, 2 tables.

Key Result

Lemma 1

In expectation, gradient estimators of a SISS loss function $\ell_{\lambda}(\theta)$ and the naive deletion loss $L_{X\setminus A}(\theta)$ are the same.

Figures (6)

  • Figure 1: Examples of quality degradation across unlearning methods. On all $3$ datasets, we find that our SISS method is the only method capable of unlearning specific training datapoints while maintaining the original model quality. See Tables \ref{['tab:celeb_metrics']}, \ref{['tab:mnist_unlearning']} and Figure \ref{['fig:sd_quality']} for complete quantitative results on quality preservation.
  • Figure 2: CelebA-HQ SSCD Metric Calculation. The process begins by taking the training face to be unlearned and injecting noise as part of the DDPM's forward noising process. Prior to unlearning, denoising the noise-injected face will result in a high similarity to the original training face. After unlearning, we desire for the denoised face to be significantly less similar to the training face.
  • Figure 3: Visualization of celebrity unlearning over fine-tuning steps on quality-preserving methods. The images shown are made by applying noise to the original face and denoising as explained in Figure \ref{['fig:celeb_sscd_procedure']}. Only SISS $(\lambda=0.5)$ and SISS (No IS) demonstrate the ability to guide the model away from generating the celebrity face.
  • Figure 4: On both datasets, the only Pareto improvements over the pretrained model are given by SISS ($\lambda=0.5$) and SISS (No IS). Remarkably, on MNIST T-Shirt, the two methods are Pareto improvements over the retrained model as well.
  • Figure 5: Visualization of memorization mitigation on Stable Diffusion v1.4 using SISS ($\lambda=0.5$). The number of memorized samples decreases from $6$ to $0$ on the partially-memorized prompt "Mothers influence on "her young hippo." Note the two apostrophes in red were purposefully inserted to turn the fully-memorized prompt into a partially-memorized prompt (see Section \ref{['sec:sd']} for details).
  • ...and 1 more figures

Theorems & Definitions (1)

  • Lemma 1