Table of Contents
Fetching ...

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi

TL;DR

Large diffusion based text-to-image models raise privacy, copyright, and safety concerns due to memorization of sensitive concepts. The authors present Forget-Me-Not, a lightweight attention-based method to forget or correct targeted concepts by re steering cross attention, achieving forgetting in seconds with minimal impact on other content. They introduce the Memorization Score and ConceptBench to quantify memorization and forgetting across identities, objects, and styles, and demonstrate practical extensions including NSFW removal and concept correction. The work enables safe distribution through lightweight patches and provides a foundation for fairer, more inclusive generative models.

Abstract

The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

TL;DR

Large diffusion based text-to-image models raise privacy, copyright, and safety concerns due to memorization of sensitive concepts. The authors present Forget-Me-Not, a lightweight attention-based method to forget or correct targeted concepts by re steering cross attention, achieving forgetting in seconds with minimal impact on other content. They introduce the Memorization Score and ConceptBench to quantify memorization and forgetting across identities, objects, and styles, and demonstrate practical extensions including NSFW removal and concept correction. The work enables safe distribution through lightweight patches and provides a foundation for fairer, more inclusive generative models.

Abstract

The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.
Paper Structure (18 sections, 4 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 4 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: Given a text-to-image model (i.e. Stable Diffusion), our approach can swiftly re-steer the cross attention towards a specific concept and subsequently forgetting or correcting it. (1) Concept Forgetting: target concepts (denoted in blue text and crossed-out) are successfully removed without compromising the quality of the output. (2) Concept Correction & Disentangle: our method can be used to correct a dominant or undesired concept of a prompt. Prior overshadowed concepts reveal in outputs after the dominant concepts are forgotten. In addition, our method learns to forget fast with only 30 seconds for certain concepts (e.g. Elon Musk), and can be easily adapted to lightweight model patches for Stable Diffusion, allowing for multi-concept manipulation and convenient distribution to users.
  • Figure 2: This figure shows two baseline forgetting methods and our proposed Forget-Me-Not. The target concept to forget is Elon Musk. One baseline is (a) Token Blacklist that simply replaces the target token with a different one. The other baseline is (b) Naive Fintuning in which instead of replacing tokens, it finetunes model weights so that the new weights generate outputs containing unrelated concepts. Our method (c) Forget-Me-Not utilizes Attention Re-steering in which we finetune only UNet to minimize each of the intermediate attention maps associated with the target concepts to forget.
  • Figure 3: This figure shows the Attention Re-steering we proposed in our Forget-Me-Not method, in which we set the objective function to minimize the attention maps of target concepts (i.e. Elon Musk in this case) and correspondingly finetune the network.
  • Figure 4: Finetuning to forget concept "Johnny Depp" with unrelated images of "a photo of man". This method distorts other concepts with visual details of selected unrelated images.
  • Figure 5: Results of concept forgetting using our method. The first 2x2 grid shows the original samples in Stable Diffusion. The subsequent 3 images are sampled after concept forgetting, using the same prompt. The top 3 rows are from a multi-concept model targeting both Elon Musk and Taylor Swift, demonstrating the multi-concept forgetting capability. Control concepts such as Bill Gates and Emma Watson manifest that our approach has minimal impact on concepts other than target ones. The last row shows two single-concept model of styles. Output images were generated with prompts: "a photo of X" (top 3 rows), "a dog in X style" (bottom row).
  • ...and 6 more figures