Table of Contents
Fetching ...

MACE: Mass Concept Erasure in Diffusion Models

Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, Adams Wai-Kin Kong

TL;DR

MACE tackles the challenge of erasing large sets of concepts from text-to-image diffusion models without sacrificing generation quality or preserving unrelated content. It combines closed-form cross-attention refinement to remove residual concept information from co-occurring words with per-concept LoRA modules and a non-interfering fusion objective, augmented by concept-focal importance sampling to maintain specificity. Across object, celebrity, explicit content, and artistic style erasure tasks, MACE achieves superior generality-specialty balance and scales to up to 100 concepts, outperforming prior methods. This approach offers a practical pathway to safer and more controllable diffusion-based content generation for real-world services, while acknowledging scalability limits and outlining directions for further scaling and robustness.

Abstract

The rapid expansion of large-scale text-to-image diffusion models has raised growing concerns regarding their potential misuse in creating harmful or misleading content. In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure. This task aims to prevent models from generating images that embody unwanted concepts when prompted. Existing concept erasure methods are typically restricted to handling fewer than five concepts simultaneously and struggle to find a balance between erasing concept synonyms (generality) and maintaining unrelated concepts (specificity). In contrast, MACE differs by successfully scaling the erasure scope up to 100 concepts and by achieving an effective balance between generality and specificity. This is achieved by leveraging closed-form cross-attention refinement along with LoRA finetuning, collectively eliminating the information of undesirable concepts. Furthermore, MACE integrates multiple LoRAs without mutual interference. We conduct extensive evaluations of MACE against prior methods across four different tasks: object erasure, celebrity erasure, explicit content erasure, and artistic style erasure. Our results reveal that MACE surpasses prior methods in all evaluated tasks. Code is available at https://github.com/Shilin-LU/MACE.

MACE: Mass Concept Erasure in Diffusion Models

TL;DR

MACE tackles the challenge of erasing large sets of concepts from text-to-image diffusion models without sacrificing generation quality or preserving unrelated content. It combines closed-form cross-attention refinement to remove residual concept information from co-occurring words with per-concept LoRA modules and a non-interfering fusion objective, augmented by concept-focal importance sampling to maintain specificity. Across object, celebrity, explicit content, and artistic style erasure tasks, MACE achieves superior generality-specialty balance and scales to up to 100 concepts, outperforming prior methods. This approach offers a practical pathway to safer and more controllable diffusion-based content generation for real-world services, while acknowledging scalability limits and outlining directions for further scaling and robustness.

Abstract

The rapid expansion of large-scale text-to-image diffusion models has raised growing concerns regarding their potential misuse in creating harmful or misleading content. In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure. This task aims to prevent models from generating images that embody unwanted concepts when prompted. Existing concept erasure methods are typically restricted to handling fewer than five concepts simultaneously and struggle to find a balance between erasing concept synonyms (generality) and maintaining unrelated concepts (specificity). In contrast, MACE differs by successfully scaling the erasure scope up to 100 concepts and by achieving an effective balance between generality and specificity. This is achieved by leveraging closed-form cross-attention refinement along with LoRA finetuning, collectively eliminating the information of undesirable concepts. Furthermore, MACE integrates multiple LoRAs without mutual interference. We conduct extensive evaluations of MACE against prior methods across four different tasks: object erasure, celebrity erasure, explicit content erasure, and artistic style erasure. Our results reveal that MACE surpasses prior methods in all evaluated tasks. Code is available at https://github.com/Shilin-LU/MACE.
Paper Structure (23 sections, 16 equations, 27 figures, 12 tables)

This paper contains 23 sections, 16 equations, 27 figures, 12 tables.

Figures (27)

  • Figure 1: Our proposed method, MACE, can erase a large number of concepts from text-to-image diffusion models. This can safeguard celebrity portrait rights, respect copyrights on artworks, and prevent explicit content creation. (a) MACE demonstrates good efficacy and generality by preventing the generation of images reflecting the target concept and its synonyms. (b) MACE maintains excellent specificity, ensuring that the unintended concepts remain intact, even when they share common terms with the target concept. (c) MACE exhibits a significantly enhanced ability to erase 100 concepts, outperforming previous methods. The overall score indicates the comprehensive erasing capability, as detailed in Section \ref{['sec:exp_cele']}.
  • Figure 2: A concept can be generated solely via residual information: (a) Average cross-attention map for each word presents that a concept’s information is embedded within other words. (b) A puppy can be generated solely using residual information by replacing the text embedding of 'puppy' with that of the final [EOS] token. Additional examples are available in Figure \ref{['fig:appendix_residual']}.
  • Figure 3: Overview of MACE: (a) Our framework focuses on tuning the prompts-related projection matrices, $\mathbf{W}_k$ and $\mathbf{W}_v$, within cross-attention (CA) blocks. (b) (Section \ref{['sec:close']} & Figure \ref{['fig:close']}) The pretrained U-Net's CA blocks are refined using a closed-form solution, discouraging the model from embedding the residual information of the target phrase into surrounding words. (c) (Section \ref{['sec:single']} & Figure \ref{['fig:lora']}) For each concept targeted for removal, a distinct LoRA module is learned to eliminate its intrinsic information. (d) (Section \ref{['sec:fusion']}) A closed-form solution is introduced to integrate multiple LoRA modules without interfering with one another while averting catastrophic forgetting.
  • Figure 4: Closed-Form Cross-Attention Refinement: The $\mathbf{W}_k^\prime$ is tuned such that the 'Keys’ of words co-existing with the target phrase 'airplane’ are mapped to the 'Keys’ of those same words when the target phrase is replaced with a generic concept ‘sky’.
  • Figure 5: Training with LoRA to Erase Intrinsic Information: Eight images are generated for each target concept as a training set via SD v1.4. To obtain the attention maps, the images undergo forward diffusion to timestep $t$ and then are fed into the closed-form refined model for predicting noise at timestep $t$. The LoRA modules are trained to reduce the activation in the masked attention maps that correspond to the target phrase.
  • ...and 22 more figures