Table of Contents
Fetching ...

Erasing with Precision: Evaluating Specific Concept Erasure from Text-to-Image Generative Models

Masane Fuchi, Tomohiro Takagi

TL;DR

EraseEval introduces a principled, black-box framework for evaluating concept erasure in text-to-image models by combining three evaluation criteria into four metrics and a single aggregate score. It replaces subjective visual judgments with protocolized, automated assessments that reflect prompt fidelity, vulnerability to related concepts, and preservation of unrelated concepts, using LLMs and standard encoders. Across 11 erasure methods and 18 concepts, the study shows that many methods struggle against prompt paraphrasing and implicit descriptions, highlighting the need for robust evaluation and improved techniques. The framework facilitates fair comparisons, reproducibility, and guided future research toward stronger, attack-resistant concept erasure in diffusion-based image generation systems.

Abstract

Studies have been conducted to prevent specific concepts from being generated from pretrained text-to-image generative models, achieving concept erasure in various ways. However, the performance evaluation of these studies is still largely reliant on visualization, with the superiority of studies often determined by human subjectivity. The metrics of quantitative evaluation also vary, making comprehensive comparisons difficult. We propose EraseEval, an evaluation method that differs from previous evaluation methods in that it involves three fundamental evaluation criteria: (1) How well does the prompt containing the target concept be reflected, (2) To what extent the concepts related to the erased concept can reduce the impact of the erased concept, and (3) Whether other concepts are preserved. These criteria are evaluated and integrated into a single metric, such that a lower score is given if any of the evaluations are low, leading to a more robust assessment. We experimentally evaluated baseline concept erasure methods, organized their characteristics, and identified challenges with them. Despite being fundamental evaluation criteria, some concept erasure methods failed to achieve high scores, which point toward future research directions for concept erasure methods. Our code is available at https://github.com/fmp453/erase-eval.

Erasing with Precision: Evaluating Specific Concept Erasure from Text-to-Image Generative Models

TL;DR

EraseEval introduces a principled, black-box framework for evaluating concept erasure in text-to-image models by combining three evaluation criteria into four metrics and a single aggregate score. It replaces subjective visual judgments with protocolized, automated assessments that reflect prompt fidelity, vulnerability to related concepts, and preservation of unrelated concepts, using LLMs and standard encoders. Across 11 erasure methods and 18 concepts, the study shows that many methods struggle against prompt paraphrasing and implicit descriptions, highlighting the need for robust evaluation and improved techniques. The framework facilitates fair comparisons, reproducibility, and guided future research toward stronger, attack-resistant concept erasure in diffusion-based image generation systems.

Abstract

Studies have been conducted to prevent specific concepts from being generated from pretrained text-to-image generative models, achieving concept erasure in various ways. However, the performance evaluation of these studies is still largely reliant on visualization, with the superiority of studies often determined by human subjectivity. The metrics of quantitative evaluation also vary, making comprehensive comparisons difficult. We propose EraseEval, an evaluation method that differs from previous evaluation methods in that it involves three fundamental evaluation criteria: (1) How well does the prompt containing the target concept be reflected, (2) To what extent the concepts related to the erased concept can reduce the impact of the erased concept, and (3) Whether other concepts are preserved. These criteria are evaluated and integrated into a single metric, such that a lower score is given if any of the evaluations are low, leading to a more robust assessment. We experimentally evaluated baseline concept erasure methods, organized their characteristics, and identified challenges with them. Despite being fundamental evaluation criteria, some concept erasure methods failed to achieve high scores, which point toward future research directions for concept erasure methods. Our code is available at https://github.com/fmp453/erase-eval.

Paper Structure

This paper contains 38 sections, 5 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Results of our evaluation method, EraseEval, for erasing object concept. For each metric, represented in range of $[0, 1]$, higher score is better. These results are also shown in \ref{['tab:results-object']}. Almost of all concept erasure methods minimized effect on other concept ($M_3$ and $M_4$). However, many erased models did not reflect input prompt containing the erased concept ($M_1$) and were vulnerable to prompt-rephrased erased concept ($M_2$).
  • Figure 2: Our question in this term
  • Figure 3: Generated image "A painting of starry night." using the text-to-image model erased "Van Gogh Style" using SPM. Although we did not use the phrase "Van Gogh style", image was generated.
  • Figure 4: Generated images using effective prompt. Eiffel Tower and Arc de Triomphe, landmarks of Paris, are generated, although those words were not used.
  • Figure 5: Protocol 1
  • ...and 5 more figures