Table of Contents
Fetching ...

Memories of Forgotten Concepts

Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, Shai Avidan

TL;DR

The paper interrogates whether concept erasure in diffusion models truly removes unwanted concepts or leaves latent memories that can still generate ablated images. It develops a memory-centric analysis by inverting erased outputs to latent seeds and measuring seed likelihood alongside image reconstruction quality, across multiple concepts and erasure methods. The findings show high-likelihood seeds can reproduce erased concepts, and many distinct seeds can reconstruct the same ablated image, indicating that erasure is not robust and that latent spaces retain memories of erased content. These results highlight vulnerabilities in current concept-ablation techniques and motivate a shift toward memory-aware evaluation and more reliable unlearning in diffusion frameworks. The work introduces the Sequential Inversion Block and Earth Mover’s Distance-based relative distance metrics to quantify memory, with practical implications for safety and privacy in diffusion-based generation systems.

Abstract

Diffusion models dominate the space of text-to-image generation, yet they may produce undesirable outputs, including explicit content or private data. To mitigate this, concept ablation techniques have been explored to limit the generation of certain concepts. In this paper, we reveal that the erased concept information persists in the model and that erased concept images can be generated using the right latent. Utilizing inversion methods, we show that there exist latent seeds capable of generating high quality images of erased concepts. Moreover, we show that these latents have likelihoods that overlap with those of images outside the erased concept. We extend this to demonstrate that for every image from the erased concept set, we can generate many seeds that generate the erased concept. Given the vast space of latents capable of generating ablated concept images, our results suggest that fully erasing concept information may be intractable, highlighting possible vulnerabilities in current concept ablation techniques.

Memories of Forgotten Concepts

TL;DR

The paper interrogates whether concept erasure in diffusion models truly removes unwanted concepts or leaves latent memories that can still generate ablated images. It develops a memory-centric analysis by inverting erased outputs to latent seeds and measuring seed likelihood alongside image reconstruction quality, across multiple concepts and erasure methods. The findings show high-likelihood seeds can reproduce erased concepts, and many distinct seeds can reconstruct the same ablated image, indicating that erasure is not robust and that latent spaces retain memories of erased content. These results highlight vulnerabilities in current concept-ablation techniques and motivate a shift toward memory-aware evaluation and more reliable unlearning in diffusion frameworks. The work introduces the Sequential Inversion Block and Earth Mover’s Distance-based relative distance metrics to quantify memory, with practical implications for safety and privacy in diffusion-based generation systems.

Abstract

Diffusion models dominate the space of text-to-image generation, yet they may produce undesirable outputs, including explicit content or private data. To mitigate this, concept ablation techniques have been explored to limit the generation of certain concepts. In this paper, we reveal that the erased concept information persists in the model and that erased concept images can be generated using the right latent. Utilizing inversion methods, we show that there exist latent seeds capable of generating high quality images of erased concepts. Moreover, we show that these latents have likelihoods that overlap with those of images outside the erased concept. We extend this to demonstrate that for every image from the erased concept set, we can generate many seeds that generate the erased concept. Given the vast space of latents capable of generating ablated concept images, our results suggest that fully erasing concept information may be intractable, highlighting possible vulnerabilities in current concept ablation techniques.

Paper Structure

This paper contains 24 sections, 13 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Evaluation of concept erasure models: Prior Art vs. Our Analysis. Prior art analyzes the image generated by an ablated model using the text (or textual embeddings) and a random seed. Instead, we assume that both text and ablated image are given and analyze the likelihood of the corresponding seed, in the latent space of the model, as well as the quality of the generated image. We find that ablated models contain seeds with high likelihood that can be used to generate high quality ablated images.
  • Figure 2: NLL histogram: For a model that erased the concept Nudity (EraseDiff wu2024erasediff), the likelihood distribution fits different Gaussians ($\mathop{\mathrm{NLL_{\rightarrow z_T}}}\nolimits(E)$, $\mathop{\mathrm{NLL_{\rightarrow z_T}}}\nolimits(R)$), that are different from the sampling distribution of the LDM which is standard normal distribution ($\text{NLL}(\mathcal{N})$).
  • Figure 3: Visualizing our distance measure: Our relative distance measure is the ratio of $EMD(E,\mathcal{N}))$ to $EMD(R,\mathcal{N}))$, where $E$ is the erased set, $R$ is the reference set, $\mathcal{N}$ is the normal distribution, and EMD is Earth Movers Distance. As can be seen, the erased model $E_1$ is much farther than $E_2$, suggesting that the model that forgot $E_1$ did a much better job.
  • Figure 4: Memory of an ablated image: Given an ablated query image $\mathcal{I}_q$, our goal is to find a likely latent $z_T$ that can accurately reconstruct the image when processed through an ablated diffusion model. We start by encoding $\mathcal{I}_q$ into a latent $z_0$ with the encoder, then apply diffusion inversion to obtain a seed latent vector $z_T$. This seed is fed into the LDM to generate the image $\mathcal{\hat{I}}_q$. Finally, we evaluate the likelihood of $z_T$ and the quality of the reconstructed image $\mathcal{\hat{I}}_q$ compared to $\mathcal{I}_q$.
  • Figure 5: A concept erased model remembers: We report the mean reconstruction PSNR (a) and our proposed relative distance (b) for six concept datasets $\{$Nudity, Van Gogh, Church, Garbage Truck, Parachute, Tench$\}$ across nine different concept ablation methods $\{$EraseDiff wu2024erasediff, ESD gandikota2023esd, FMN zhang2024forget, Salun fan2023salun, Scissorhands wu2024scissorhands, SPM lyu2024spm, UCE gandikota2024uce, AC kumari2023ablating, AdvUnlearn zhang2024defensive$\}$, along with one "Vanilla" SD 1.4 Rombach_2022_CVPR model. These results validate that, at the dataset level, there exists at least one latent per image that can reconstruct the image with high quality (PSNR $\geq 25$ dB) from a reasonable likelihood using the concept erased model.
  • ...and 11 more figures