Table of Contents
Fetching ...

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

TL;DR

Receler introduces a lightweight eraser that plugs into cross-attention in pre-trained diffusion models to erase a target concept with strong locality and robustness. It achieves locality via concept-localized regularization that confines erasing to concept-related regions and robustness via adversarial prompt learning that trains against malicious prompts. Extensive CIFAR-10 and I2P experiments show Receler outperforms prior methods in object erasure and inappropriate content erasure, including resilience to paraphrased and learned attack prompts. The approach offers a practical, parameter-efficient framework for safe diffusion-based image synthesis with compositional capabilities at inference.

Abstract

Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. To perform reliable concept erasure, the properties of robustness and locality are desirable. The former refrains the model from producing images associated with the target concept for any paraphrased or learned prompts, while the latter preserves its ability in generating images with non-target concepts. In this paper, we propose Reliable Concept Erasing via Lightweight Erasers (Receler). It learns a lightweight Eraser to perform concept erasing while satisfying the above desirable properties through the proposed concept-localized regularization and adversarial prompt learning scheme. Experiments with various concepts verify the superiority of Receler over previous methods.

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

TL;DR

Receler introduces a lightweight eraser that plugs into cross-attention in pre-trained diffusion models to erase a target concept with strong locality and robustness. It achieves locality via concept-localized regularization that confines erasing to concept-related regions and robustness via adversarial prompt learning that trains against malicious prompts. Extensive CIFAR-10 and I2P experiments show Receler outperforms prior methods in object erasure and inappropriate content erasure, including resilience to paraphrased and learned attack prompts. The approach offers a practical, parameter-efficient framework for safe diffusion-based image synthesis with compositional capabilities at inference.

Abstract

Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. To perform reliable concept erasure, the properties of robustness and locality are desirable. The former refrains the model from producing images associated with the target concept for any paraphrased or learned prompts, while the latter preserves its ability in generating images with non-target concepts. In this paper, we propose Reliable Concept Erasing via Lightweight Erasers (Receler). It learns a lightweight Eraser to perform concept erasing while satisfying the above desirable properties through the proposed concept-localized regularization and adversarial prompt learning scheme. Experiments with various concepts verify the superiority of Receler over previous methods.
Paper Structure (24 sections, 4 equations, 6 figures, 6 tables)

This paper contains 24 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of (a) robustness and (b) locality-preserving in concept erasing. The former requires models to be robust against paraphrased attacks from the target concept (i.e., "nudity"), while the latter aims to preserve the visual content of non-target concepts (e.g., "tennis player" or "airplane"). Note that SD denotes Stable Diffusion rombach2021compvissd; UCE gandikota2023unified and ESD gandikota2023erasing are recent works on concept erasing.
  • Figure 2: Overview of Receler. (a) Receler involves iterative learning of a lightweight Eraser $E$ and adversarial prompt embedding $e_{\textit{Adv}}$. The former is trained to erase the target concept $c$ while preserving non-target concepts, and the latter learns to imitate the prompts to recover visual content associated with the concept previously erased. (b) The Eraser $E$ is inserted after each cross attention layer of Diffusion U-Net to remove the target concept from its outputs, with prediction $o^l$ directly added to the cross attention output.
  • Figure 3: Qualitative comparison of concept erasure methods. Note that erased concepts are listed at the top, and images generated from each method are shown in each row. Input prompts used for image generation are provided in supplementary.
  • Figure 4: Visualization of robustness and locality from Receler on CIFAR-10. The red strikethrough at the top indicates the erased concepts. On the left, the input paraphrased prompts are provided. Images enclosed within the diagonal orange borders shows robustness while others shows the locality.
  • Figure 5: Visualization of erasure methods against learned attack prompts. We use Ring-A-Bell tsai2023ring to generate adversarial prompts for nudity and violence concepts.
  • ...and 1 more figures