Table of Contents
Fetching ...

Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models

Tianqi Chen, Shujian Zhang, Mingyuan Zhou

TL;DR

This work tackles the safety and privacy concerns of diffusion models by enabling targeted unlearning without access to real data. It introduces Score Forgetting Distillation (SFD), a data-free MU method that uses cross-class score distillation to override undesired concepts with safe ones while preserving remaining generation capabilities, and yields a one-step generator for fast sampling. The approach combines a distillation objective with a forgetting regularization and employs an alternating update between a generator and a learned score to achieve rapid forgetting while maintaining quality. Experiments on CIFAR-10, STL-10, and Stable Diffusion demonstrate effective forgetting (high Unlearning Accuracy) and strong preservation of sample quality and speed, highlighting practical benefits for trustworthy diffusion-based GenAI.

Abstract

The machine learning community is increasingly recognizing the importance of fostering trust and safety in modern generative AI (GenAI) models. We posit machine unlearning (MU) as a crucial foundation for developing safe, secure, and trustworthy GenAI models. Traditional MU methods often rely on stringent assumptions and require access to real data. This paper introduces Score Forgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of "unsafe" classes or concepts with those of "safe" ones. To eliminate the need for real data, our SFD framework incorporates a score-based MU loss into the score distillation objective of a pretrained diffusion model. This serves as a regularization term that preserves desired generation capabilities while enabling the production of synthetic data through a one-step generator. Our experiments on pretrained label-conditional and text-to-image diffusion models demonstrate that our method effectively accelerates the forgetting of target classes or concepts during generation, while preserving the quality of other classes or concepts. This unlearned and distilled diffusion not only pioneers a novel concept in MU but also accelerates the generation speed of diffusion models. Our experiments and studies on a range of diffusion models and datasets confirm that our approach is generalizable, effective, and advantageous for MU in diffusion models. Code is available at https://github.com/tqch/score-forgetting-distillation. ($\textbf{Warning:}$ This paper contains sexually explicit imagery, discussions of pornography, racially-charged terminology, and other content that some readers may find disturbing, distressing, and/or offensive.)

Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models

TL;DR

This work tackles the safety and privacy concerns of diffusion models by enabling targeted unlearning without access to real data. It introduces Score Forgetting Distillation (SFD), a data-free MU method that uses cross-class score distillation to override undesired concepts with safe ones while preserving remaining generation capabilities, and yields a one-step generator for fast sampling. The approach combines a distillation objective with a forgetting regularization and employs an alternating update between a generator and a learned score to achieve rapid forgetting while maintaining quality. Experiments on CIFAR-10, STL-10, and Stable Diffusion demonstrate effective forgetting (high Unlearning Accuracy) and strong preservation of sample quality and speed, highlighting practical benefits for trustworthy diffusion-based GenAI.

Abstract

The machine learning community is increasingly recognizing the importance of fostering trust and safety in modern generative AI (GenAI) models. We posit machine unlearning (MU) as a crucial foundation for developing safe, secure, and trustworthy GenAI models. Traditional MU methods often rely on stringent assumptions and require access to real data. This paper introduces Score Forgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of "unsafe" classes or concepts with those of "safe" ones. To eliminate the need for real data, our SFD framework incorporates a score-based MU loss into the score distillation objective of a pretrained diffusion model. This serves as a regularization term that preserves desired generation capabilities while enabling the production of synthetic data through a one-step generator. Our experiments on pretrained label-conditional and text-to-image diffusion models demonstrate that our method effectively accelerates the forgetting of target classes or concepts during generation, while preserving the quality of other classes or concepts. This unlearned and distilled diffusion not only pioneers a novel concept in MU but also accelerates the generation speed of diffusion models. Our experiments and studies on a range of diffusion models and datasets confirm that our approach is generalizable, effective, and advantageous for MU in diffusion models. Code is available at https://github.com/tqch/score-forgetting-distillation. ( This paper contains sexually explicit imagery, discussions of pornography, racially-charged terminology, and other content that some readers may find disturbing, distressing, and/or offensive.)
Paper Structure (40 sections, 1 theorem, 14 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 40 sections, 1 theorem, 14 equations, 9 figures, 10 tables, 1 algorithm.

Key Result

Lemma 1

The Score Forgetting Distillation (SFD) loss in Eq. eqn:SFD can be equivalently expressed as

Figures (9)

  • Figure 1: Celebrity forgetting effects of two celebrities,$i.e.$, "Brad Pitt" and "Angelina Jolie." Each column represents the images generated from the same text prompt on the top and the same random seed (initial noise) by SFD checkpoints at 0,5,10,25,50,100 thousands images (#kimgs) seen.
  • Figure 2: Overview of score forgetting distillation (SFD). Some notations are labeled along with corresponding components. 'Snowflake' refers to the frozen (non-trainable), 'Fire' refers to the trainable, and 'Combine' refers to combining operation on input losses by arithmetic addition according to predefined weights.
  • Figure 3: Generated images on CIFAR-10 and STL-10 during the training of SFD. The upper panel shows $3 \times 3$ grids of generated samples at different time steps, with fixed random seeds and class labels arranged from 1 to 9 (left to right, top to bottom). The same sequence of random seeds is used across all grids to ensure consistency. The lower panel illustrates the forgetting process for two examples from CIFAR-10 and STL-10.
  • Figure 4: FID between generated images and original dataset of remaining classes. The solid blue line and dot denote the training FIDs and final FID evaluated at the last checkpoint of one-step SFD generator; the dotted green line marks the initial FID of the pre-trained model using 1,000 sampling steps. The solid orange line and dot mark the training UAs and final UA evaluated at the last checkpoint of SFD; the dotted orange line marks the initial UA of the pre-trained model.
  • Figure 5: Remaining FIDs on different model architectures. The solid blue and solid orange bars denote the remaining FID evaluated for pre-trained DDPM and EDM respectively. The transparent blue and transparent orange bars denote the remaining FID evaluated at the last training step for unlearned and distilled diffusion using DDPM and EDM respectively.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Lemma 1