CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models
Yuyang Xue, Edward Moroshko, Feng Chen, Jingyu Sun, Steven McDonagh, Sotirios A. Tsaftaris
TL;DR
CRCE tackles under-/over-erasure in text-to-image diffusion models by using a Large Language Model to identify coreferential concepts that should be erased and retains that should be preserved. It introduces a manifold-aware loss that jointly erases the target concept and its corefs while keeping unrelated content, and it provides the CorefConcept dataset to support this approach. Across object, intellectual property, and celebrity domains, CRCE outperforms prior methods by achieving precise erasure with robust retention of semantically related concepts. The work highlights how leveraging semantic relationships via LLM guidance can improve concept erasure in diffusion models and points to future work on scaling and bias mitigation.
Abstract
Text-to-Image diffusion models can produce undesirable content that necessitates concept erasure. However, existing methods struggle with under-erasure, leaving residual traces of targeted concepts, or over-erasure, mistakenly eliminating unrelated but visually similar concepts. To address these limitations, we introduce CRCE, a novel concept erasure framework that leverages Large Language Models to identify both semantically related concepts that should be erased alongside the target and distinct concepts that should be preserved. By explicitly modelling coreferential and retained concepts semantically, CRCE enables more precise concept removal, without unintended erasure. Experiments demonstrate that CRCE outperforms existing methods on diverse erasure tasks, including real-world object, person identities, and abstract intellectual property characteristics. The constructed dataset CorefConcept and the source code will be release upon acceptance.
