Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

Justin Lee; Zheda Mai; Jinsu Yoo; Chongyu Fan; Cheng Zhang; Wei-Lun Chao

Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

Justin Lee, Zheda Mai, Jinsu Yoo, Chongyu Fan, Cheng Zhang, Wei-Lun Chao

TL;DR

The paper addresses the challenge of continual unlearning (CU) in text-to-image diffusion models, showing that sequential unlearning causes rapid degradation of retained concepts due to parameter drift. It introduces plug-in regularizers (update-norm penalties, selective fine-tuning, and model merging) and a gradient-projection method that enforces updates orthogonal to semantically similar directions, underpinned by a Taylor-based bound that relates retention loss to update magnitude. Key findings demonstrate that these methods substantially improve unlearning accuracy (UA) and retention (RA-I, RA-C) in continual settings, with gradient projection offering strong in-domain gains and compatibility with other add-ons. The work provides practical baselines and open directions for safe, accountable generative AI when unlearning requests arrive sequentially, and highlights cross-attention as a source of interference that semantic-aware updates can mitigate.

Abstract

Machine unlearning--the ability to remove designated concepts from a pre-trained model--has advanced rapidly, particularly for text-to-image diffusion models. However, existing methods typically assume that unlearning requests arrive all at once, whereas in practice they often arrive sequentially. We present the first systematic study of continual unlearning in text-to-image diffusion models and show that popular unlearning methods suffer from rapid utility collapse: after only a few requests, models forget retained knowledge and generate degraded images. We trace this failure to cumulative parameter drift from the pre-training weights and argue that regularization is crucial to addressing it. To this end, we study a suite of add-on regularizers that (1) mitigate drift and (2) remain compatible with existing unlearning methods. Beyond generic regularizers, we show that semantic awareness is essential for preserving concepts close to the unlearning target, and propose a gradient-projection method that constrains parameter drift orthogonal to their subspace. This substantially improves continual unlearning performance and is complementary to other regularizers for further gains. Taken together, our study establishes continual unlearning as a fundamental challenge in text-to-image generation and provides insights, baselines, and open directions for advancing safe and accountable generative AI.

Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

TL;DR

Abstract

Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (2)