Table of Contents
Fetching ...

EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories

Lu Wei, Yuta Nakashima, Noa Garcia

TL;DR

EMMA introduces a comprehensive, multi-domain benchmark for concept erasure in text-to-image generation, evaluating 206 concepts across five domains with 12 metrics to probe explicit and implicit erasure, retention of related concepts, efficiency, image fidelity, and bias. Through a systematic comparison of five CE methods (remapping and optimization-based), EMMA reveals that while remapping approaches generally outperform optimization-based ones, no method fully erases a concept, especially under indirect prompts, and several methods amplify gender and ethnicity bias. The framework highlights trade-offs between erasure strength, semantic preservation, computational cost, and bias, demonstrating the need for bias-aware, robust erasure strategies. EMMA thus provides a standardized platform for rigorous evaluation, guiding future improvements in concept erasure while informing safety, privacy, and copyright considerations in real-world deployments.

Abstract

The widespread adoption of text-to-image (T2I) generation has raised concerns about privacy, bias, and copyright violations. Concept erasure techniques offer a promising solution by selectively removing undesired concepts from pre-trained models without requiring full retraining. However, these methods are often evaluated on a limited set of concepts, relying on overly simplistic and direct prompts. To test the boundaries of concept erasure techniques, and assess whether they truly remove targeted concepts from model representations, we introduce EMMA, a benchmark that evaluates five key dimensions of concept erasure over 12 metrics. EMMA goes beyond standard metrics like image quality and time efficiency, testing robustness under challenging conditions, including indirect descriptions, visually similar non-target concepts, and potential gender and ethnicity bias, providing a socially aware analysis of method behavior. Using EMMA, we analyze five concept erasure methods across five domains (objects, celebrities, art styles, NSFW, and copyright). Our results show that existing methods struggle with implicit prompts (i.e., generating the erased concept when it is indirectly referenced) and visually similar non-target concepts (i.e., failing to generate non-targeted concepts resembling the erased one), while some amplify gender and ethnicity bias compared to the original model.

EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories

TL;DR

EMMA introduces a comprehensive, multi-domain benchmark for concept erasure in text-to-image generation, evaluating 206 concepts across five domains with 12 metrics to probe explicit and implicit erasure, retention of related concepts, efficiency, image fidelity, and bias. Through a systematic comparison of five CE methods (remapping and optimization-based), EMMA reveals that while remapping approaches generally outperform optimization-based ones, no method fully erases a concept, especially under indirect prompts, and several methods amplify gender and ethnicity bias. The framework highlights trade-offs between erasure strength, semantic preservation, computational cost, and bias, demonstrating the need for bias-aware, robust erasure strategies. EMMA thus provides a standardized platform for rigorous evaluation, guiding future improvements in concept erasure while informing safety, privacy, and copyright considerations in real-world deployments.

Abstract

The widespread adoption of text-to-image (T2I) generation has raised concerns about privacy, bias, and copyright violations. Concept erasure techniques offer a promising solution by selectively removing undesired concepts from pre-trained models without requiring full retraining. However, these methods are often evaluated on a limited set of concepts, relying on overly simplistic and direct prompts. To test the boundaries of concept erasure techniques, and assess whether they truly remove targeted concepts from model representations, we introduce EMMA, a benchmark that evaluates five key dimensions of concept erasure over 12 metrics. EMMA goes beyond standard metrics like image quality and time efficiency, testing robustness under challenging conditions, including indirect descriptions, visually similar non-target concepts, and potential gender and ethnicity bias, providing a socially aware analysis of method behavior. Using EMMA, we analyze five concept erasure methods across five domains (objects, celebrities, art styles, NSFW, and copyright). Our results show that existing methods struggle with implicit prompts (i.e., generating the erased concept when it is indirectly referenced) and visually similar non-target concepts (i.e., failing to generate non-targeted concepts resembling the erased one), while some amplify gender and ethnicity bias compared to the original model.

Paper Structure

This paper contains 48 sections, 5 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Concept erasure methods break under pressure. When challenged with descriptive prompts, erased concepts like dog, Vincent van Gogh, and Converse resurface in the generated images. The EMMA benchmark systematically explores this and other limitations of current concept erasure techniques through a comprehensive evaluation framework.
  • Figure 2: Overview of EMMA. EMMA benchmarks concept erasure (CE) methods on five concept domains and five evaluation dimensions. We show both desirable and undesirable cases of a CE method instructed to unlearn the concept cat. used for publication purposes.
  • Figure 3: Distribution of selected celebrities on gender and ethnicity.
  • Figure 4: Classification pipeline for three classifiers.
  • Figure 5: We present two cases of image similarity comparison across three methods: CLIP, SSIM, and DreamSim. All images are generated using MACE mace. In panel (a), the images are generated by MACE that unlearned the concept of cat, and in panel (b), by MACE that unlearned bicycle. For each group of images, the top, middle, and bottom rows correspond to prompts with masculine (men/men), neutral (person/people), or feminine (woman/women) terms, respectively. We highlight image pairs where we find the neutral image more visually similar to the feminine or masculine. Red boxes indicate that the feminine and neutral images appear more similar, while green boxes indicate higher similarity between the masculine and neutral images. No highlighting is applied if there is no apparent visual difference between the feminine and the masculine relative to the neutral one. CLIP, SSIM, and DreamSim all assign higher scores to more similar image pairs. We report the similarity scores between masculine-neutral($f_{\text{sim}}(I_n, I_m)$) and feminine-neutral($f_{\text{sim}}(I_n, I_f)$) pairs for each method. For the highlighted pairs, we color the higher-scoring group in green (masculine-neutral) or red (feminine-neutral) to indicate alignment with our visual judgment. Notably, DreamSim often produces results that contradict our annotations. The same random seed is used for each column within a group of images.
  • ...and 11 more figures