Table of Contents
Fetching ...

Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models

Naveen George, Naoki Murata, Yuhta Takida, Konda Reddy Mopuri, Yuki Mitsufuji

TL;DR

This work addresses continual unlearning in text-to-image diffusion models under sequential deletion requests, where naive one-shot methods fail due to retention collapse, ripple effects, and parameter drift. It introduces a distillation-based continual unlearning framework with three components: contextual trajectory re-steering to surgically unlearn concepts, generative replay with knowledge distillation to preserve retained knowledge, and parameter regularization to curb drift. The approach reframes each unlearning step as a multi-objective teacher-student distillation, validated on a 10-step benchmark with open-vocabulary, VLM-based evaluation, and ablations demonstrating the synergy of the components. The results show improved forget fidelity, stable retention, and preserved image quality across steps, enabling practical deployment for ongoing data removal and safety/copyright compliance.

Abstract

The recent rapid growth of visual generative models trained on vast web-scale datasets has created significant tension with data privacy regulations and copyright laws, such as GDPR's ``Right to be Forgotten.'' This necessitates machine unlearning (MU) to remove specific concepts without the prohibitive cost of retraining. However, existing MU techniques are fundamentally ill-equipped for real-world scenarios where deletion requests arrive sequentially, a setting known as continual unlearning (CUL). Naively applying one-shot methods in a continual setting triggers a stability crisis, leading to a cascade of degradation characterized by retention collapse, compounding collateral damage to related concepts, and a sharp decline in generative quality. To address this critical challenge, we introduce a novel generative distillation based continual unlearning framework that ensures targeted and stable unlearning under sequences of deletion requests. By reframing each unlearning step as a multi-objective, teacher-student distillation process, the framework leverages principles from continual learning to maintain model integrity. Experiments on a 10-step sequential benchmark demonstrate that our method unlearns forget concepts with better fidelity and achieves this without significant interference to the performance on retain concepts or the overall image quality, substantially outperforming baselines. This framework provides a viable pathway for the responsible deployment and maintenance of large-scale generative models, enabling industries to comply with ongoing data removal requests in a practical and effective manner.

Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models

TL;DR

This work addresses continual unlearning in text-to-image diffusion models under sequential deletion requests, where naive one-shot methods fail due to retention collapse, ripple effects, and parameter drift. It introduces a distillation-based continual unlearning framework with three components: contextual trajectory re-steering to surgically unlearn concepts, generative replay with knowledge distillation to preserve retained knowledge, and parameter regularization to curb drift. The approach reframes each unlearning step as a multi-objective teacher-student distillation, validated on a 10-step benchmark with open-vocabulary, VLM-based evaluation, and ablations demonstrating the synergy of the components. The results show improved forget fidelity, stable retention, and preserved image quality across steps, enabling practical deployment for ongoing data removal and safety/copyright compliance.

Abstract

The recent rapid growth of visual generative models trained on vast web-scale datasets has created significant tension with data privacy regulations and copyright laws, such as GDPR's ``Right to be Forgotten.'' This necessitates machine unlearning (MU) to remove specific concepts without the prohibitive cost of retraining. However, existing MU techniques are fundamentally ill-equipped for real-world scenarios where deletion requests arrive sequentially, a setting known as continual unlearning (CUL). Naively applying one-shot methods in a continual setting triggers a stability crisis, leading to a cascade of degradation characterized by retention collapse, compounding collateral damage to related concepts, and a sharp decline in generative quality. To address this critical challenge, we introduce a novel generative distillation based continual unlearning framework that ensures targeted and stable unlearning under sequences of deletion requests. By reframing each unlearning step as a multi-objective, teacher-student distillation process, the framework leverages principles from continual learning to maintain model integrity. Experiments on a 10-step sequential benchmark demonstrate that our method unlearns forget concepts with better fidelity and achieves this without significant interference to the performance on retain concepts or the overall image quality, substantially outperforming baselines. This framework provides a viable pathway for the responsible deployment and maintenance of large-scale generative models, enabling industries to comply with ongoing data removal requests in a practical and effective manner.

Paper Structure

This paper contains 40 sections, 5 equations, 13 figures, 5 tables, 2 algorithms.

Figures (13)

  • Figure 1: Qualitative comparison of our method (bottom row) against SOTA baselines on 10 sequential unlearning steps. This figure highlights the critical failure modes of existing methods in a continual setting. Baselines like ESD-x gandikota2023erasing and MACE lu2024mace suffer from catastrophic "retention collapse", where generative quality completely breaks down on retained concepts (right panel). Other methods like DUGE thakral2025continual avoid total collapse but still exhibit severe quality degradation and poor unlearning in later stages. In contrast, our method demonstrates a superior ability to both effectively unlearn target concepts (left panel) and preserve general knowledge (right panel), maintaining a good generative quality compared to the original SD model.
  • Figure 2: Visual overview of our full pipeline for both unlearning (left) and retaining (right). In the unlearning stage, an LLM generates two types of text conditions: forget prompts ($c_{\text{f}}$) and their mapping prompts ($c_{\text{m}}$). These are encoded using the text encoder to obtain text embeddings, and fed into a frozen teacher diffusion model ($\epsilon_{\hat{\theta}_{i-1}}$) to synthesize a clean latent $z_{0}^{\text{u}}$, which is perturbed by the noise scheduler to produce $z_{t}^{\text{u}}$. The frozen teacher model is conditioned on $z_t^{\text{u}}$, the timestep $t$, and the mapping prompt embedding, while the trainable student model ($\epsilon_{\theta_{i}}$) is conditioned on $z_t^{\text{u}}$, $t$, and the forget prompt embedding. The unlearning objective ($\mathcal{L}_{\text{unlearn}}$) is achieved by minimizing the MSE loss between the noise predictions of the teacher and the student. The retaining stage follows a similar distillation process, where the frozen teacher generates images conditioned on retain prompts ($c_{\text{r}}$) to produce latents $z_{s}^{\text{r}}$, and both the teacher and student models are conditioned on the same retain prompt at timestep $s$, with the retention objective ($\mathcal{L}_{\text{retain}}$) minimizing their prediction difference.
  • Figure 3: A visual summary of our method's performance (both Fixed Context and Adaptive Context mapping) against SOTA baselines across 10 sequential unlearning steps. Here the 'SD Baseline' indicates performance of base model SD v1.5 for easier comparison, here we assume UA be 100 and for rest of the metrics to be overall average CS and Accuracy. The plot starkly illustrates the "retention collapse" of methods like ESD-x, which shrink to the center, and the unlearning failure of methods like UCE and MACE. Our methods are the only ones that maintain a large, stable shape, indicatin[g a successful balance of all criteria. Metrics shown are: Unlearning Accuracy (UA), Unlearning CLIP Score (UCS), Related Retention Accuracy (RRA), Related Retention CLIP Score (RRCS), General Retention Accuracy (GRA), and General Retention CLIP Score (GRCS). A detailed breakdown of these metrics is provided in the supplementary material Table \ref{['tab:cul_simple']}.
  • Figure 4: Visualization comparing the effects of unlearning across multiple SOTA methods on both the unlearned and retain sets. Rows show different prompts, columns show different methods: Base (Stable Diffusion v1.5), ESD-x, ESD-u, UCE, MACE, DUGE, Fixed Contextual, and Adaptive Contextual. Three sections display: unlearned concept images, related concept images, and general retain set images, demonstrating how each method balances effective unlearning with preservation of model capabilities.
  • Figure 5: Visualization comparing the effects of unlearning across multiple SOTA methods on both the unlearned and retain sets. Rows show different prompts, columns show different methods: Base (Stable Diffusion v1.5), ESD-x, ESD-u, UCE, MACE, DUGE, Fixed Contextual, and Adaptive Contextual. Three sections display: unlearned concept images, related concept images, and general retain set images, demonstrating how each method balances effective unlearning with preservation of model capabilities.
  • ...and 8 more figures