Table of Contents
Fetching ...

Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

Justin Lee, Zheda Mai, Jinsu Yoo, Chongyu Fan, Cheng Zhang, Wei-Lun Chao

TL;DR

The paper addresses the challenge of continual unlearning (CU) in text-to-image diffusion models, showing that sequential unlearning causes rapid degradation of retained concepts due to parameter drift. It introduces plug-in regularizers (update-norm penalties, selective fine-tuning, and model merging) and a gradient-projection method that enforces updates orthogonal to semantically similar directions, underpinned by a Taylor-based bound that relates retention loss to update magnitude. Key findings demonstrate that these methods substantially improve unlearning accuracy (UA) and retention (RA-I, RA-C) in continual settings, with gradient projection offering strong in-domain gains and compatibility with other add-ons. The work provides practical baselines and open directions for safe, accountable generative AI when unlearning requests arrive sequentially, and highlights cross-attention as a source of interference that semantic-aware updates can mitigate.

Abstract

Machine unlearning--the ability to remove designated concepts from a pre-trained model--has advanced rapidly, particularly for text-to-image diffusion models. However, existing methods typically assume that unlearning requests arrive all at once, whereas in practice they often arrive sequentially. We present the first systematic study of continual unlearning in text-to-image diffusion models and show that popular unlearning methods suffer from rapid utility collapse: after only a few requests, models forget retained knowledge and generate degraded images. We trace this failure to cumulative parameter drift from the pre-training weights and argue that regularization is crucial to addressing it. To this end, we study a suite of add-on regularizers that (1) mitigate drift and (2) remain compatible with existing unlearning methods. Beyond generic regularizers, we show that semantic awareness is essential for preserving concepts close to the unlearning target, and propose a gradient-projection method that constrains parameter drift orthogonal to their subspace. This substantially improves continual unlearning performance and is complementary to other regularizers for further gains. Taken together, our study establishes continual unlearning as a fundamental challenge in text-to-image generation and provides insights, baselines, and open directions for advancing safe and accountable generative AI.

Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

TL;DR

The paper addresses the challenge of continual unlearning (CU) in text-to-image diffusion models, showing that sequential unlearning causes rapid degradation of retained concepts due to parameter drift. It introduces plug-in regularizers (update-norm penalties, selective fine-tuning, and model merging) and a gradient-projection method that enforces updates orthogonal to semantically similar directions, underpinned by a Taylor-based bound that relates retention loss to update magnitude. Key findings demonstrate that these methods substantially improve unlearning accuracy (UA) and retention (RA-I, RA-C) in continual settings, with gradient projection offering strong in-domain gains and compatibility with other add-ons. The work provides practical baselines and open directions for safe, accountable generative AI when unlearning requests arrive sequentially, and highlights cross-attention as a source of interference that semantic-aware updates can mitigate.

Abstract

Machine unlearning--the ability to remove designated concepts from a pre-trained model--has advanced rapidly, particularly for text-to-image diffusion models. However, existing methods typically assume that unlearning requests arrive all at once, whereas in practice they often arrive sequentially. We present the first systematic study of continual unlearning in text-to-image diffusion models and show that popular unlearning methods suffer from rapid utility collapse: after only a few requests, models forget retained knowledge and generate degraded images. We trace this failure to cumulative parameter drift from the pre-training weights and argue that regularization is crucial to addressing it. To this end, we study a suite of add-on regularizers that (1) mitigate drift and (2) remain compatible with existing unlearning methods. Beyond generic regularizers, we show that semantic awareness is essential for preserving concepts close to the unlearning target, and propose a gradient-projection method that constrains parameter drift orthogonal to their subspace. This substantially improves continual unlearning performance and is complementary to other regularizers for further gains. Taken together, our study establishes continual unlearning as a fundamental challenge in text-to-image generation and provides insights, baselines, and open directions for advancing safe and accountable generative AI.

Paper Structure

This paper contains 33 sections, 1 theorem, 17 equations, 12 figures, 1 table.

Key Result

Lemma 7.1

For any $c\in\mathrm{span}(C)$, the update direction $g'$ produces zero first-order change: ${g'}^\top c \,=\, 0.$

Figures (12)

  • Figure 1: Continual unlearning leads to catastrophic degradation. The pre-trained model (first column) continually unlearns 12 concepts $\mathcal{C}^\star = \{c_1^\star, c_2^\star, \dots, c_{12}^\star\}$ ( e.g., Abstractionism or Bears). Different rows display various prompts used for image generation. In each row, red boxes highlight images where some specific concepts have been unlearned. Ideally, images without red boxes should remain conceptually intact. However, as illustrated in the bottom row, continual unlearning significantly impairs the model's ability to retain concepts. Notably, after unlearning 12 concepts (last column), the model fails to generate meaningful content.
  • Figure 2: The ideal outcomes in continual unlearning. The pre-trained model continually unlearns two styles. Given the prompts to generate a cat in "Van Gogh" and "Cartoon" styles, the generated images should accurately reflect the styles. After the first unlearning step, the image for "Cartoon" should remain conceptually unchanged, while the image for "Van Gogh" should no longer exhibit the "Van Gogh" style. Following the second unlearning step, both the "Van Gogh" and "Cartoon" should be removed, while the concept "cat" should be retained.
  • Figure 3: ConAbl ca fails when unlearn requests arrive continually. Although it performs well at the initial request, unlearning sequentially leads to poor retention. Simultaneously unlearning all requests better preserves retention, but comes with a much higher cost. Plots for EraseDiff wu2024erasediff in \ref{['-sec: results']}.
  • Figure 4: ConAbl ca's cumulative $\ell_2$ parameter drift w.r.t the pre-trained model. Sequential unlearning exhibits severe cumulative drift with more unlearned concepts compared to simultaneous unlearning. Our add-on regularizers effectively mitigate this drift and demonstrate better retention (\ref{['fig:unlearn-ca-addon']}).
  • Figure 5: Overview of our add-on regularizers.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Lemma 7.1: First-order invariance
  • proof