Table of Contents
Fetching ...

Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted

Ruchika Chavhan, Ondrej Bohdal, Yongshuo Zong, Da Li, Timothy Hospedales

TL;DR

This paper introduces a novel post-hoc method for editing pre-trained models, whereby memorization is mitigated through the straightforward pruning of weights in specialized subspaces, avoiding the need to disrupt the training or inference process as seen in prior research.

Abstract

Large-scale text-to-image diffusion models excel in generating high-quality images from textual inputs, yet concerns arise as research indicates their tendency to memorize and replicate training data, raising We also addressed the issue of memorization in diffusion models, where models tend to replicate exact training samples raising copyright infringement and privacy issues. Efforts within the text-to-image community to address memorization explore causes such as data duplication, replicated captions, or trigger tokens, proposing per-prompt inference-time or training-time mitigation strategies. In this paper, we focus on the feed-forward layers and begin by contrasting neuron activations of a set of memorized and non-memorized prompts. Experiments reveal a surprising finding: many different sets of memorized prompts significantly activate a common subspace in the model, demonstrating, for the first time, that memorization in the diffusion models lies in a special subspace. Subsequently, we introduce a novel post-hoc method for editing pre-trained models, whereby memorization is mitigated through the straightforward pruning of weights in specialized subspaces, avoiding the need to disrupt the training or inference process as seen in prior research. Finally, we demonstrate the robustness of the pruned model against training data extraction attacks, thereby unveiling new avenues for a practical and one-for-all solution to memorization.

Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted

TL;DR

This paper introduces a novel post-hoc method for editing pre-trained models, whereby memorization is mitigated through the straightforward pruning of weights in specialized subspaces, avoiding the need to disrupt the training or inference process as seen in prior research.

Abstract

Large-scale text-to-image diffusion models excel in generating high-quality images from textual inputs, yet concerns arise as research indicates their tendency to memorize and replicate training data, raising We also addressed the issue of memorization in diffusion models, where models tend to replicate exact training samples raising copyright infringement and privacy issues. Efforts within the text-to-image community to address memorization explore causes such as data duplication, replicated captions, or trigger tokens, proposing per-prompt inference-time or training-time mitigation strategies. In this paper, we focus on the feed-forward layers and begin by contrasting neuron activations of a set of memorized and non-memorized prompts. Experiments reveal a surprising finding: many different sets of memorized prompts significantly activate a common subspace in the model, demonstrating, for the first time, that memorization in the diffusion models lies in a special subspace. Subsequently, we introduce a novel post-hoc method for editing pre-trained models, whereby memorization is mitigated through the straightforward pruning of weights in specialized subspaces, avoiding the need to disrupt the training or inference process as seen in prior research. Finally, we demonstrate the robustness of the pruned model against training data extraction attacks, thereby unveiling new avenues for a practical and one-for-all solution to memorization.

Paper Structure

This paper contains 17 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Density of memorized neurons averaged over timestep (left) and layer (right) for 10 different subsets containing 10 prompts each. We observe that the number of neurons identified as memorized is similar across different subsets.
  • Figure 2: Average Pairwise IOU averaged over timestep (left) and layer (right) for $N=10$ and varying subset sized $m$.
  • Figure 3: Left: Quality (CLIP similarity score, $\uparrow$) vs Memorization (SSCD, $\downarrow$) for 10 different pruned models compared with inference-time mitigation in wen2024detecting. All the pruned models show less memorization than the no-mitigation baseline indicating that memorization can be edited via model pruning. Right: Clock Time and COCO30k FID for baselines and our proposed approach. We provide similar generation quality and memorization reduction than wen2024detecting, but substantially faster inference.
  • Figure 4: The initial row displays images generated by the pre-trained model, while subsequent rows depict images generated by different pruned models. Notably, despite sharing the same seed, different pruned models yield semantically similar images. This striking observation reveals that memorization resides in a potentially unique space in pre-trained diffusion models. More qualitative results are presented in the appendix in Section \ref{['sec:appendix']}.
  • Figure 5: Left and Middle: IOU between memorized neurons discovered from different subsets of memorized prompts is high, indicating localization of memorization. Right: Memorization in SD2.0 can be mitigated with our proposed approach, indicating its generalizability across different models.
  • ...and 3 more figures