ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Zihao Wang; Yuxiang Wei; Fan Li; Renjing Pei; Hang Xu; Wangmeng Zuo

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Zihao Wang, Yuxiang Wei, Fan Li, Renjing Pei, Hang Xu, Wangmeng Zuo

TL;DR

ACE addresses the risk of unsafe content in diffusion-based text-to-image generation by enabling anti-editing concept erasure. It achieves this by injecting erasure guidance into both conditional and unconditional noise predictions and by employing a loss framework that includes unconditional erasure guidance, a prior-consistency constraint, and a random-prior correction (PG-UEG) to preserve non-target priors, yielding a final objective $L_{ACE} = \lambda_{PUnc} L_{PUnc} + \lambda_{Cons} L_{Cons} + \lambda_{ESD} L_{ESD}$. Across IP character, nudity, and artistic style erasures, ACE demonstrates thorough target-concept erasure while preserving related concepts and providing stronger editing filtration than state-of-the-art baselines. The results highlight ACE's practical impact for safer diffusion-based content creation, including editing workflows, and the work offers a publish-ready implementation on public code release.

Abstract

Recent advance in text-to-image diffusion models have significantly facilitated the generation of high-quality images, but also raising concerns about the illegal creation of harmful content, such as copyrighted images. Existing concept erasure methods achieve superior results in preventing the production of erased concept from prompts, but typically perform poorly in preventing undesired editing. To address this issue, we propose an Anti-Editing Concept Erasure (ACE) method, which not only erases the target concept during generation but also filters out it during editing. Specifically, we propose to inject the erasure guidance into both conditional and the unconditional noise prediction, enabling the model to effectively prevent the creation of erasure concepts during both editing and generation. Furthermore, a stochastic correction guidance is introduced during training to address the erosion of unrelated concepts. We conducted erasure editing experiments with representative editing methods (i.e., LEDITS++ and MasaCtrl) to erase IP characters, and the results indicate that our ACE effectively filters out target concepts in both types of edits. Additional experiments on erasing explicit concepts and artistic styles further demonstrate that our ACE performs favorably against state-of-the-art methods. Our code will be publicly available at https://github.com/120L020904/ACE.

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

TL;DR

. Across IP character, nudity, and artistic style erasures, ACE demonstrates thorough target-concept erasure while preserving related concepts and providing stronger editing filtration than state-of-the-art baselines. The results highlight ACE's practical impact for safer diffusion-based content creation, including editing workflows, and the work offers a publish-ready implementation on public code release.

Abstract

Paper Structure (25 sections, 16 equations, 18 figures, 12 tables, 1 algorithm)

This paper contains 25 sections, 16 equations, 18 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Concept Erasure in T2I Models
Text-driven Image Editing
Attacks in T2I Models
Proposed Method
Preliminaries
Anti-Editing Concept Erasure
Prior Concept Preservation
Experiments
IP Character Removal
Explicit Content Removal
Artistic Style Removal
Ablation Study
Conclusion
...and 10 more sections

Figures (18)

Figure 1: (a) Given a text-to-image (T2I) model, there are two common methods to adopt it to create undesired contents, i.e., generating new images based on text prompts or editing existing images. (b) Current concept erasure methods primarily focus on preventing the generation of erased concepts but fail to protect against image editing. In contrast, our ACE method can prevent the production of such content during both generation and editing processes. As shown, after erasing Pikachu, it successfully prevents the edits involving Pikachu.
Figure 2: Overview of our proposed ACE.(a) In CFG, both conditional noise and unconditional noise are adopted to generate high-quality images. (b) ESD gandikota2023erasing unlearns the target concept (e.g., Mickey) by aligning conditional noise prediction with conditional erasure guidance (CEG). (c) During the fine-tuning, our ACE injects erasure guidance into both conditional and unconditional noise prediction, preventing the production of unsafe content during both generation and editing. PG-UEG denotes the prior-guided unconditional erasure guidance calculated following Eqn \ref{['eqn:pg-ueg']}.
Figure 3: Qualitative comparisons of IP character removal. Our ACE effectively erases the target concept while generating other concepts successfully.
Figure 4: Comparison of our ACE method with other methods in terms of editing filtering. After erasing Mickey Mouse, our method filtered out edits involving Mickey Mouse while not affecting edits related to other IP characters. In contrast, the competing methods either fail to prevent editing (e.g., ESD, SPM, RECE, and MACE) or cannot perform editing on non-target concepts (e.g., AdvUnlearn).
Figure 5: Qualitative results of nudity removal. Figure (a) shows the results of explicit editing using SD-Inpainting, while Figure (b) displays images generated using text with explicit label. Static adversarial text is used for editing text, while dynamic adversarial attacks are employed for generation. It can be observed that our method effectively reduces exposure in both editing and generation tasks. Moreover, our method maintains its effectiveness when editing and generating using adversarial text, indicating its robustness.
...and 13 more figures

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

TL;DR

Abstract

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Authors

TL;DR

Abstract

Table of Contents

Figures (18)