Table of Contents
Fetching ...

Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch

TL;DR

NeMo is introduced, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers, and makes the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples.

Abstract

Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data, usually scraped from the internet without proper attribution or consent from content creators. Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.

Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

TL;DR

NeMo is introduced, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers, and makes the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples.

Abstract

Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data, usually scraped from the internet without proper attribution or consent from content creators. Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.
Paper Structure (35 sections, 5 equations, 20 figures, 4 tables, 4 algorithms)

This paper contains 35 sections, 5 equations, 20 figures, 4 tables, 4 algorithms.

Figures (20)

  • Figure 1: Overview of NeMo. For memorized prompts, we observe that the same (original training) image is constantly generated independently of the initial random seed. This yields severe privacy and copyright concerns. In the initial stage, NeMofirst identifies candidate neurons potentially responsible for the memorization based on out-of-distribution activations. In a refinement step, NeModetects the memorization neurons from the candidate set by leveraging the noise similarities during the first denoising step. Deactivating memorization neurons prevents unintended memorization behavior and induces diversity in the generated images.
  • Figure 2: Differences Between Memorized and Non-memorized Prompts.(a) depicts the distribution of pairwise SSIM scores between initial noise differences starting from different seeds. Since the noise trajectories are more consistent for memorized samples, the score reflects the degree of memorization. (b) shows the distribution of the $z$-scores of each neuron in the first cross-attention value layer. Memorization neurons produce considerably higher activations, here depicted as standardized $z$-scores, for memorized prompts, allowing them to be identified by outlier detection.
  • Figure 3: Impact of Deactivating Memorization Neurons. The top row shows images generated with memorized prompts, closely replicating the training images. The bottom row demonstrates that deactivating memorization neurons increases diversity and mitigates memorization. Notably, only a few neurons (counts indicated by digits in the boxes) are responsible for these memorizations.
  • Figure 4: Distribution of Memorization Neurons.(a) shows the number of prompts that are memorized by a fixed number of neurons, e.g., the verbatim memorization of 28 prompts is located in single neurons. (b) depicts the average number of memorization neurons per layer and prompt.
  • Figure 5: Image Quality and Sensitivity to Scaling Factor.(a) assesses the generated images' quality when blocking an increasing number of neurons. As can be seen, the FID and KID values vary only slightly, indicating that blocking neurons identified by NeMo does not negatively affect image generation quality. Gray lines indicate the baseline without any neurons blocked. (b) investigates the effect of scaling the memorization neurons' activations by a scaling factor instead of deactivating them (scaling by zero). Whereas positively scaling memorization neuron activations only slightly reduces memorization, negative scaling reduces the memorization not any further.
  • ...and 15 more figures