Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models
Naveen George, Naoki Murata, Yuhta Takida, Konda Reddy Mopuri, Yuki Mitsufuji
TL;DR
This work addresses continual unlearning in text-to-image diffusion models under sequential deletion requests, where naive one-shot methods fail due to retention collapse, ripple effects, and parameter drift. It introduces a distillation-based continual unlearning framework with three components: contextual trajectory re-steering to surgically unlearn concepts, generative replay with knowledge distillation to preserve retained knowledge, and parameter regularization to curb drift. The approach reframes each unlearning step as a multi-objective teacher-student distillation, validated on a 10-step benchmark with open-vocabulary, VLM-based evaluation, and ablations demonstrating the synergy of the components. The results show improved forget fidelity, stable retention, and preserved image quality across steps, enabling practical deployment for ongoing data removal and safety/copyright compliance.
Abstract
The recent rapid growth of visual generative models trained on vast web-scale datasets has created significant tension with data privacy regulations and copyright laws, such as GDPR's ``Right to be Forgotten.'' This necessitates machine unlearning (MU) to remove specific concepts without the prohibitive cost of retraining. However, existing MU techniques are fundamentally ill-equipped for real-world scenarios where deletion requests arrive sequentially, a setting known as continual unlearning (CUL). Naively applying one-shot methods in a continual setting triggers a stability crisis, leading to a cascade of degradation characterized by retention collapse, compounding collateral damage to related concepts, and a sharp decline in generative quality. To address this critical challenge, we introduce a novel generative distillation based continual unlearning framework that ensures targeted and stable unlearning under sequences of deletion requests. By reframing each unlearning step as a multi-objective, teacher-student distillation process, the framework leverages principles from continual learning to maintain model integrity. Experiments on a 10-step sequential benchmark demonstrate that our method unlearns forget concepts with better fidelity and achieves this without significant interference to the performance on retain concepts or the overall image quality, substantially outperforming baselines. This framework provides a viable pathway for the responsible deployment and maintenance of large-scale generative models, enabling industries to comply with ongoing data removal requests in a practical and effective manner.
