Table of Contents
Fetching ...

Prototype-Guided Concept Erasure in Diffusion Models

Yuze Cai, Jiahao Lu, Hongxiang Shi, Yichao Zhou, Hong Lu

TL;DR

This work exploits the model's intrinsic embedding geometry to identify latent embeddings that encode a given concept, and derives a set of concept prototypes that summarize the model's internal representations of the concept, and employs them as negative conditioning signals during inference to achieve precise and reliable erasure.

Abstract

Concept erasure is extensively utilized in image generation to prevent text-to-image models from generating undesired content. Existing methods can effectively erase narrow concepts that are specific and concrete, such as distinct intellectual properties (e.g. Pikachu) or recognizable characters (e.g. Elon Musk). However, their performance degrades on broad concepts such as ``sexual'' or ``violent'', whose wide scope and multi-faceted nature make them difficult to erase reliably. To overcome this limitation, we exploit the model's intrinsic embedding geometry to identify latent embeddings that encode a given concept. By clustering these embeddings, we derive a set of concept prototypes that summarize the model's internal representations of the concept, and employ them as negative conditioning signals during inference to achieve precise and reliable erasure. Extensive experiments across multiple benchmarks show that our approach achieves substantially more reliable removal of broad concepts while preserving overall image quality, marking a step towards safer and more controllable image generation.

Prototype-Guided Concept Erasure in Diffusion Models

TL;DR

This work exploits the model's intrinsic embedding geometry to identify latent embeddings that encode a given concept, and derives a set of concept prototypes that summarize the model's internal representations of the concept, and employs them as negative conditioning signals during inference to achieve precise and reliable erasure.

Abstract

Concept erasure is extensively utilized in image generation to prevent text-to-image models from generating undesired content. Existing methods can effectively erase narrow concepts that are specific and concrete, such as distinct intellectual properties (e.g. Pikachu) or recognizable characters (e.g. Elon Musk). However, their performance degrades on broad concepts such as ``sexual'' or ``violent'', whose wide scope and multi-faceted nature make them difficult to erase reliably. To overcome this limitation, we exploit the model's intrinsic embedding geometry to identify latent embeddings that encode a given concept. By clustering these embeddings, we derive a set of concept prototypes that summarize the model's internal representations of the concept, and employ them as negative conditioning signals during inference to achieve precise and reliable erasure. Extensive experiments across multiple benchmarks show that our approach achieves substantially more reliable removal of broad concepts while preserving overall image quality, marking a step towards safer and more controllable image generation.
Paper Structure (14 sections, 7 equations, 6 figures, 12 tables)

This paper contains 14 sections, 7 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: We present prototype-guided concept erasure, a training-free method which models a target concept through a set of concept prototypes that summarize its diverse semantic modes in the embedding space. These prototypes serve as negative guidance signals during inference and enable effective removal of both broad and narrow concepts while preserving the overall generation quality. Sensitive content is masked for display.
  • Figure 2: Broad concepts such as violence encompass multiple semantic modes, including bloodshed (first row), gunfights (second row), and riots (third row). Existing methods such as Safree yoon2025safree and TRCE chen2025trce erase only part of this spectrum, resulting in incomplete erasure. We highlight this unreliability as a core limitation of prior approaches and introduce our method, which achieves more comprehensive and reliable removal by explicitly modeling the full breadth of a target concept.
  • Figure 3: Performance of different methods when simultaneously erasing Van Gogh and Snoopy.
  • Figure 4: Ablation study on the number of prototypes $K$ for two representative broad concepts. Results consistently show that a moderate number of prototypes (around $K\!=\!16$) provides the best trade-off between erasure completeness and preservation of generation quality.
  • Figure 5: Images most relevant to the prototype of the concept "Sexual". Each row represents a distinct prototype direction, illustrating diverse typical patterns captured by different prototypes. Row 1 captures explicit nudity, Row 2 focuses on seductive attire, and Row 3 targets implicit sexual content and artistic styles.
  • ...and 1 more figures