Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, Shai Avidan
TL;DR
The paper interrogates whether concept erasure in diffusion models truly removes unwanted concepts or leaves latent memories that can still generate ablated images. It develops a memory-centric analysis by inverting erased outputs to latent seeds and measuring seed likelihood alongside image reconstruction quality, across multiple concepts and erasure methods. The findings show high-likelihood seeds can reproduce erased concepts, and many distinct seeds can reconstruct the same ablated image, indicating that erasure is not robust and that latent spaces retain memories of erased content. These results highlight vulnerabilities in current concept-ablation techniques and motivate a shift toward memory-aware evaluation and more reliable unlearning in diffusion frameworks. The work introduces the Sequential Inversion Block and Earth Mover’s Distance-based relative distance metrics to quantify memory, with practical implications for safety and privacy in diffusion-based generation systems.
Abstract
Diffusion models dominate the space of text-to-image generation, yet they may produce undesirable outputs, including explicit content or private data. To mitigate this, concept ablation techniques have been explored to limit the generation of certain concepts. In this paper, we reveal that the erased concept information persists in the model and that erased concept images can be generated using the right latent. Utilizing inversion methods, we show that there exist latent seeds capable of generating high quality images of erased concepts. Moreover, we show that these latents have likelihoods that overlap with those of images outside the erased concept. We extend this to demonstrate that for every image from the erased concept set, we can generate many seeds that generate the erased concept. Given the vast space of latents capable of generating ablated concept images, our results suggest that fully erasing concept information may be intractable, highlighting possible vulnerabilities in current concept ablation techniques.
