Table of Contents
Fetching ...

Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression

Yiwei Xie, Ping Liu, Zheng Zhang

TL;DR

The paper addresses safety, fairness, and controllability concerns in text-to-image diffusion models by surveying concept erasure methods that suppress unwanted semantics during generation. It introduces a multidimensional taxonomy organized by intervention level, optimization strategy, and semantic scope, and surveys evaluation benchmarks, datasets, and practical deployment challenges. Key contributions include a comprehensive synthesis of techniques across text-encoder, cross-attention, and UNet interventions; analysis of loss-based, closed-form, adapter-based, and adversarial strategies; and discussion of multi-concept and implicit-content erasure. The work signals practical impact by guiding researchers and practitioners toward safer, more adaptable generative systems, and highlights open challenges such as conceptual entanglement, continual erasure, and novel architectures like Flux models that require new erasure paradigms.

Abstract

Text-to-Image (T2I) models have demonstrated impressive capabilities in generating high-quality and diverse visual content from natural language prompts. However, uncontrolled reproduction of sensitive, copyrighted, or harmful imagery poses serious ethical, legal, and safety challenges. To address these concerns, the concept erasure paradigm has emerged as a promising direction, enabling the selective removal of specific semantic concepts from generative models while preserving their overall utility. This survey provides a comprehensive overview and in-depth synthesis of concept erasure techniques in T2I diffusion models. We systematically categorize existing approaches along three key dimensions: intervention level, which identifies specific model components targeted for concept removal; optimization structure, referring to the algorithmic strategies employed to achieve suppression; and semantic scope, concerning the complexity and nature of the concepts addressed. This multi-dimensional taxonomy enables clear, structured comparisons across diverse methodologies, highlighting fundamental trade-offs between erasure specificity, generalization, and computational complexity. We further discuss current evaluation benchmarks, standardized metrics, and practical datasets, emphasizing gaps that limit comprehensive assessment, particularly regarding robustness and practical effectiveness. Finally, we outline major challenges and promising future directions, including disentanglement of concept representations, adaptive and incremental erasure strategies, adversarial robustness, and new generative architectures. This survey aims to guide researchers toward safer, more ethically aligned generative models, providing foundational knowledge and actionable recommendations to advance responsible development in generative AI.

Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression

TL;DR

The paper addresses safety, fairness, and controllability concerns in text-to-image diffusion models by surveying concept erasure methods that suppress unwanted semantics during generation. It introduces a multidimensional taxonomy organized by intervention level, optimization strategy, and semantic scope, and surveys evaluation benchmarks, datasets, and practical deployment challenges. Key contributions include a comprehensive synthesis of techniques across text-encoder, cross-attention, and UNet interventions; analysis of loss-based, closed-form, adapter-based, and adversarial strategies; and discussion of multi-concept and implicit-content erasure. The work signals practical impact by guiding researchers and practitioners toward safer, more adaptable generative systems, and highlights open challenges such as conceptual entanglement, continual erasure, and novel architectures like Flux models that require new erasure paradigms.

Abstract

Text-to-Image (T2I) models have demonstrated impressive capabilities in generating high-quality and diverse visual content from natural language prompts. However, uncontrolled reproduction of sensitive, copyrighted, or harmful imagery poses serious ethical, legal, and safety challenges. To address these concerns, the concept erasure paradigm has emerged as a promising direction, enabling the selective removal of specific semantic concepts from generative models while preserving their overall utility. This survey provides a comprehensive overview and in-depth synthesis of concept erasure techniques in T2I diffusion models. We systematically categorize existing approaches along three key dimensions: intervention level, which identifies specific model components targeted for concept removal; optimization structure, referring to the algorithmic strategies employed to achieve suppression; and semantic scope, concerning the complexity and nature of the concepts addressed. This multi-dimensional taxonomy enables clear, structured comparisons across diverse methodologies, highlighting fundamental trade-offs between erasure specificity, generalization, and computational complexity. We further discuss current evaluation benchmarks, standardized metrics, and practical datasets, emphasizing gaps that limit comprehensive assessment, particularly regarding robustness and practical effectiveness. Finally, we outline major challenges and promising future directions, including disentanglement of concept representations, adaptive and incremental erasure strategies, adversarial robustness, and new generative architectures. This survey aims to guide researchers toward safer, more ethically aligned generative models, providing foundational knowledge and actionable recommendations to advance responsible development in generative AI.

Paper Structure

This paper contains 35 sections, 37 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Concept erasure using ESD. The top two rows show image generations from the original model (left) and the unlearned model (right) across three erasure scenarios: (1) artistic style ("Van Gogh"), (2) object ("Car"), and (3) celebrity ("Emma Watson"). The unlearned model exhibits a clear suppression in the target concept while retaining general image quality. The bottom row demonstrates generation with unrelated prompts, confirming that the model maintains its generative capacity outside the erased concept domain.
  • Figure 2: Taxonomy of concept erasure techniques in text-to-image generative models, categorized along three orthogonal axes: (1) intervention level (e.g., text encoder vs. non-text encoder interventions), (2) optimization strategy (e.g., adversarial alignment, plug-in adapters, loss-based training), and (3) semantic scope (explicit vs. multi-concept). Representative methods are annotated with corresponding references.
  • Figure 3: Intervention-level categorization and corresponding optimization strategies for concept erasure in diffusion models. Concept erasure can be applied at different model components—text encoder, cross-attention, and UNet—each supporting specific methods such as loss-based optimization, plug-in adapters, or adversarial training.
  • Figure 4: Illustration of the text-to-image generation pipelines for (a) Diffusion models and (b) Flux models. For Diffusion models (a), the textual prompt is encoded by a Text Encoder, and fused with the image latent through Cross-Attention modules within the UNet architecture. The naive model (top-right) faithfully generates the prompted concept (“graffiti of the snoopy”), while the unlearned model (bottom-right) demonstrates successful concept erasure by suppressing the target concept (“Snoopy”). For the Flux model (b), the text is encoded separately by T5 and Text Encoders and subsequently integrated with the image latent via linear projections and transformer layers without employing cross-attention.