Table of Contents
Fetching ...

CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models

Yuyang Xue, Edward Moroshko, Feng Chen, Jingyu Sun, Steven McDonagh, Sotirios A. Tsaftaris

TL;DR

CRCE tackles under-/over-erasure in text-to-image diffusion models by using a Large Language Model to identify coreferential concepts that should be erased and retains that should be preserved. It introduces a manifold-aware loss that jointly erases the target concept and its corefs while keeping unrelated content, and it provides the CorefConcept dataset to support this approach. Across object, intellectual property, and celebrity domains, CRCE outperforms prior methods by achieving precise erasure with robust retention of semantically related concepts. The work highlights how leveraging semantic relationships via LLM guidance can improve concept erasure in diffusion models and points to future work on scaling and bias mitigation.

Abstract

Text-to-Image diffusion models can produce undesirable content that necessitates concept erasure. However, existing methods struggle with under-erasure, leaving residual traces of targeted concepts, or over-erasure, mistakenly eliminating unrelated but visually similar concepts. To address these limitations, we introduce CRCE, a novel concept erasure framework that leverages Large Language Models to identify both semantically related concepts that should be erased alongside the target and distinct concepts that should be preserved. By explicitly modelling coreferential and retained concepts semantically, CRCE enables more precise concept removal, without unintended erasure. Experiments demonstrate that CRCE outperforms existing methods on diverse erasure tasks, including real-world object, person identities, and abstract intellectual property characteristics. The constructed dataset CorefConcept and the source code will be release upon acceptance.

CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models

TL;DR

CRCE tackles under-/over-erasure in text-to-image diffusion models by using a Large Language Model to identify coreferential concepts that should be erased and retains that should be preserved. It introduces a manifold-aware loss that jointly erases the target concept and its corefs while keeping unrelated content, and it provides the CorefConcept dataset to support this approach. Across object, intellectual property, and celebrity domains, CRCE outperforms prior methods by achieving precise erasure with robust retention of semantically related concepts. The work highlights how leveraging semantic relationships via LLM guidance can improve concept erasure in diffusion models and points to future work on scaling and bias mitigation.

Abstract

Text-to-Image diffusion models can produce undesirable content that necessitates concept erasure. However, existing methods struggle with under-erasure, leaving residual traces of targeted concepts, or over-erasure, mistakenly eliminating unrelated but visually similar concepts. To address these limitations, we introduce CRCE, a novel concept erasure framework that leverages Large Language Models to identify both semantically related concepts that should be erased alongside the target and distinct concepts that should be preserved. By explicitly modelling coreferential and retained concepts semantically, CRCE enables more precise concept removal, without unintended erasure. Experiments demonstrate that CRCE outperforms existing methods on diverse erasure tasks, including real-world object, person identities, and abstract intellectual property characteristics. The constructed dataset CorefConcept and the source code will be release upon acceptance.

Paper Structure

This paper contains 28 sections, 3 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Consider the task of erasing the "Cat" concept. We define "Siamese cat" as coreference (coref), which should also be erased, and "Dog" as a retain concept, i.e. which should not be erased. We show examples from SD v1.4 before any erasure and proceed with results from concept erasure methods. Green checkmarks (✓) indicate successful erasure or retentions, while red crosses ($\times$) highlight failures. Our approach effectively balances erasure and retention, reducing both under- and over-erasure issues compared to existing methods.
  • Figure 2: Illustration of how related and unrelated concepts to “dog” are arranged in CLIP's embedding space. The red dot marks the target concept "dog”; yellow stars (e.g "guide dog”, "service dog”) represent coreferent concepts along the same semantic manifold, while blue triangles (e.g "cat”, "pig”) denote unrelated concepts to be retained. Neglecting corefs and retains leads to under-/over-erasure. RealEra liu2024realera samples random corefs in a spherical region, which poorly approximates the true semantic geometry—often capturing unrelated concepts (purple diamonds) - and ignores the non-Euclidean nature of concept relationships, where semantically distinct concepts may appear close in Euclidean space (blue triangles).
  • Figure 3: Overview of our proposed CRCE method. Our method erases a target concept (e.g "dog") while preserving unrelated concepts. Using LLM prompting, we generate corefs (e.g "Chihuahua", "puppy") and retains (e.g "cat", "coyote"), each assigned a certainty score. The CRCE loss optimizes both coref erasure and retain preservation, adjusting the embedding space to minimize unintended removals. The final erasure results show that the target ("dog") and its coref terms ("puppy") are erased, while unrelated concepts ("cat", "pig") remain intact, ensuring effective and controlled concept erasing.
  • Figure 4: Comparison of concept erasure effectiveness between RealEra liu2024realera and our method. "Mindy Kaling" (celebrity), "Micky Mouse" (IP), and "Deer" (object) are targeted for removal along with their corefs. CRCE successfully erases corefs while accurately retaining related yet distinct entities, demonstrating superior precision compared to RealEra.
  • Figure 5: This figure demonstrates how SD v1.4 incorrectly overfits the concept of "Gotham City" with "The Batman". While "Gotham City antagonist" is a valid coreference for "The Joker", erasing "The Joker" also distorts "The Batman", revealing implicit model biases from the T2I model itself.
  • ...and 4 more figures