Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Tianqi Chen, Shujian Zhang, Mingyuan Zhou
TL;DR
This work tackles the safety and privacy concerns of diffusion models by enabling targeted unlearning without access to real data. It introduces Score Forgetting Distillation (SFD), a data-free MU method that uses cross-class score distillation to override undesired concepts with safe ones while preserving remaining generation capabilities, and yields a one-step generator for fast sampling. The approach combines a distillation objective with a forgetting regularization and employs an alternating update between a generator and a learned score to achieve rapid forgetting while maintaining quality. Experiments on CIFAR-10, STL-10, and Stable Diffusion demonstrate effective forgetting (high Unlearning Accuracy) and strong preservation of sample quality and speed, highlighting practical benefits for trustworthy diffusion-based GenAI.
Abstract
The machine learning community is increasingly recognizing the importance of fostering trust and safety in modern generative AI (GenAI) models. We posit machine unlearning (MU) as a crucial foundation for developing safe, secure, and trustworthy GenAI models. Traditional MU methods often rely on stringent assumptions and require access to real data. This paper introduces Score Forgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of "unsafe" classes or concepts with those of "safe" ones. To eliminate the need for real data, our SFD framework incorporates a score-based MU loss into the score distillation objective of a pretrained diffusion model. This serves as a regularization term that preserves desired generation capabilities while enabling the production of synthetic data through a one-step generator. Our experiments on pretrained label-conditional and text-to-image diffusion models demonstrate that our method effectively accelerates the forgetting of target classes or concepts during generation, while preserving the quality of other classes or concepts. This unlearned and distilled diffusion not only pioneers a novel concept in MU but also accelerates the generation speed of diffusion models. Our experiments and studies on a range of diffusion models and datasets confirm that our approach is generalizable, effective, and advantageous for MU in diffusion models. Code is available at https://github.com/tqch/score-forgetting-distillation. ($\textbf{Warning:}$ This paper contains sexually explicit imagery, discussions of pornography, racially-charged terminology, and other content that some readers may find disturbing, distressing, and/or offensive.)
