Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
TL;DR
Receler introduces a lightweight eraser that plugs into cross-attention in pre-trained diffusion models to erase a target concept with strong locality and robustness. It achieves locality via concept-localized regularization that confines erasing to concept-related regions and robustness via adversarial prompt learning that trains against malicious prompts. Extensive CIFAR-10 and I2P experiments show Receler outperforms prior methods in object erasure and inappropriate content erasure, including resilience to paraphrased and learned attack prompts. The approach offers a practical, parameter-efficient framework for safe diffusion-based image synthesis with compositional capabilities at inference.
Abstract
Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. To perform reliable concept erasure, the properties of robustness and locality are desirable. The former refrains the model from producing images associated with the target concept for any paraphrased or learned prompts, while the latter preserves its ability in generating images with non-target concepts. In this paper, we propose Reliable Concept Erasing via Lightweight Erasers (Receler). It learns a lightweight Eraser to perform concept erasing while satisfying the above desirable properties through the proposed concept-localized regularization and adversarial prompt learning scheme. Experiments with various concepts verify the superiority of Receler over previous methods.
