Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

Kartik Thakral; Tamar Glaser; Tal Hassner; Mayank Vatsa; Richa Singh

Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

Kartik Thakral, Tamar Glaser, Tal Hassner, Mayank Vatsa, Richa Singh

TL;DR

The paper addresses the challenge of removing specific concepts from large text-to-image diffusion models without retraining. It introduces DUGE, a memory-regularized, cross-attention–focused decremental unlearning framework that preserves non-target knowledge while forgetting targeted concepts, formalized through losses $${\mathcal{L}}_{\text{U}}$$, $${\mathcal{L}}_{\text{pr}}$$, and $${\mathcal{R}}_{pr}$$. Across the Dec-ImageNet-20 experiments, DUGE successfully forgets target concepts with minimal generalization erosion, achieving superior $\text{KID}$/$\text{FID}$ metrics and a human-validated unlearning accuracy around $95\%$+; the approach scales to multiple decremental steps and correlated concepts. This work offers a practical, scalable method for selectively unlearning copyrighted or sensitive content in foundation diffusion models, reducing risks while preserving core generative capabilities. It provides a rigorous framework for continual unlearning with a concrete memory-based regularization strategy, potentially enabling deployment in real-world content moderation and rights-management scenarios.

Abstract

How can we effectively unlearn selected concepts from pre-trained generative foundation models without resorting to extensive retraining? This research introduces `continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models, incrementally. We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts while preserving the generation of related, non-targeted concepts and alleviating generalization erosion. For this, DUGE targets three losses: a cross-attention loss that steers the focus towards images devoid of the target concept; a prior-preservation loss that safeguards knowledge related to non-target concepts; and a regularization loss that prevents the model from suffering from generalization erosion. Experimental results demonstrate the ability of the proposed approach to exclude certain concepts without compromising the overall integrity and performance of the model. This offers a pragmatic solution for refining generative models, adeptly handling the intricacies of model training and concept management lowering the risks of copyright infringement, personal or licensed material misuse, and replication of distinctive artistic styles. Importantly, it maintains the non-targeted concepts, thereby safeguarding the model's core capabilities and effectiveness.

Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

TL;DR

Abstract

Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)