Table of Contents
Fetching ...

Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

Kartik Thakral, Tamar Glaser, Tal Hassner, Mayank Vatsa, Richa Singh

TL;DR

The paper addresses the challenge of removing specific concepts from large text-to-image diffusion models without retraining. It introduces DUGE, a memory-regularized, cross-attention–focused decremental unlearning framework that preserves non-target knowledge while forgetting targeted concepts, formalized through losses $${\mathcal{L}}_{\text{U}}$$, $${\mathcal{L}}_{\text{pr}}$$, and $${\mathcal{R}}_{pr}$$. Across the Dec-ImageNet-20 experiments, DUGE successfully forgets target concepts with minimal generalization erosion, achieving superior $\text{KID}$/$\text{FID}$ metrics and a human-validated unlearning accuracy around $95\%$+; the approach scales to multiple decremental steps and correlated concepts. This work offers a practical, scalable method for selectively unlearning copyrighted or sensitive content in foundation diffusion models, reducing risks while preserving core generative capabilities. It provides a rigorous framework for continual unlearning with a concrete memory-based regularization strategy, potentially enabling deployment in real-world content moderation and rights-management scenarios.

Abstract

How can we effectively unlearn selected concepts from pre-trained generative foundation models without resorting to extensive retraining? This research introduces `continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models, incrementally. We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts while preserving the generation of related, non-targeted concepts and alleviating generalization erosion. For this, DUGE targets three losses: a cross-attention loss that steers the focus towards images devoid of the target concept; a prior-preservation loss that safeguards knowledge related to non-target concepts; and a regularization loss that prevents the model from suffering from generalization erosion. Experimental results demonstrate the ability of the proposed approach to exclude certain concepts without compromising the overall integrity and performance of the model. This offers a pragmatic solution for refining generative models, adeptly handling the intricacies of model training and concept management lowering the risks of copyright infringement, personal or licensed material misuse, and replication of distinctive artistic styles. Importantly, it maintains the non-targeted concepts, thereby safeguarding the model's core capabilities and effectiveness.

Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

TL;DR

The paper addresses the challenge of removing specific concepts from large text-to-image diffusion models without retraining. It introduces DUGE, a memory-regularized, cross-attention–focused decremental unlearning framework that preserves non-target knowledge while forgetting targeted concepts, formalized through losses , , and . Across the Dec-ImageNet-20 experiments, DUGE successfully forgets target concepts with minimal generalization erosion, achieving superior / metrics and a human-validated unlearning accuracy around +; the approach scales to multiple decremental steps and correlated concepts. This work offers a practical, scalable method for selectively unlearning copyrighted or sensitive content in foundation diffusion models, reducing risks while preserving core generative capabilities. It provides a rigorous framework for continual unlearning with a concrete memory-based regularization strategy, potentially enabling deployment in real-world content moderation and rights-management scenarios.

Abstract

How can we effectively unlearn selected concepts from pre-trained generative foundation models without resorting to extensive retraining? This research introduces `continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models, incrementally. We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts while preserving the generation of related, non-targeted concepts and alleviating generalization erosion. For this, DUGE targets three losses: a cross-attention loss that steers the focus towards images devoid of the target concept; a prior-preservation loss that safeguards knowledge related to non-target concepts; and a regularization loss that prevents the model from suffering from generalization erosion. Experimental results demonstrate the ability of the proposed approach to exclude certain concepts without compromising the overall integrity and performance of the model. This offers a pragmatic solution for refining generative models, adeptly handling the intricacies of model training and concept management lowering the risks of copyright infringement, personal or licensed material misuse, and replication of distinctive artistic styles. Importantly, it maintains the non-targeted concepts, thereby safeguarding the model's core capabilities and effectiveness.

Paper Structure

This paper contains 17 sections, 8 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Continual Unlearning: Illustrating the methodology for selective forgetting in text-to-image foundational generative models like Midjourney, stable diffusion, and Imagen. This illustration outlines the multi-step process for the removal of specific concepts from the model's knowledge while simultaneously ensuring that its ability to generalize over other concepts remains intact.
  • Figure 2: Illustration of the complete DUGE algorithm for the continual unlearning process. (a) Depicts the unlearning process employed at each decremental step; (b) depicts the prior-preservation loss utilizing a small memory for regularization; and (c) demonstrates the complete decremental framework of the DUGE formulation with multiple decremental steps.
  • Figure 3: Visualizing the generation results on set 1 for the proposed algorithm DUGE, alongside FMN zhang2024forget and ESD-X gandikota2023erasing. It is evident that zhang2024forget and gandikota2023erasing suffer from generalization erosion in a continual unlearning setting when prompted with ‘an image of a violin’ and ‘an image of an airplane,’ whereas DUGE effectively generates samples at each decremental step.
  • Figure 4: Qualitative results from various sets, illustrating samples of unlearnt and other classes at each decremental step $\delta$. It is evident that across all sets, DUGE consistently unlearns the target class while preserving the knowledge of other concepts.
  • Figure 5: Illustrating the performance of DUGE on correlated concepts. (a) Continual unlearning of concepts like Apple, Orange, and Broccoli, with performance visualized across all concepts. (b) Visualization of DUGE’s performance on correlated concepts such as car and bus. (c) Performance of DUGE on Concept Bench zhang2024forget for different dog breeds. We observe that DUGE successfully decrementally unlearns correlated concepts, including (a) fruits and vegetables, and various types of balls; (b) bus and car; (c) different dog breeds.
  • ...and 4 more figures