Table of Contents
Fetching ...

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji

TL;DR

This work shows that memorization in diffusion-based image generation can be understood through an attraction-basin dynamic in the denoising trajectory. The authors propose an inference-time mitigation that delays classifier-free guidance (CFG) until a transition point is reached, plus Opposite Guidance to hasten exit from the attraction basin, all without retraining. They formalize static and dynamic transition-point strategies and demonstrate their effectiveness across multiple memorization scenarios, including LAION-100k finetuning, data duplication, and trigger-token prompts, while maintaining image quality and textual alignment. The proposed method is fast, requires no prompt or weight changes, and generalizes beyond individual scenarios, offering a practical, broadly applicable solution to memorization in diffusion models. Overall, it contributes a dynamical-systems perspective and a lightweight, robust mitigation that can be readily integrated into existing diffusion pipelines.

Abstract

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel perspective on the memorization phenomenon and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, opposite guidance, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

TL;DR

This work shows that memorization in diffusion-based image generation can be understood through an attraction-basin dynamic in the denoising trajectory. The authors propose an inference-time mitigation that delays classifier-free guidance (CFG) until a transition point is reached, plus Opposite Guidance to hasten exit from the attraction basin, all without retraining. They formalize static and dynamic transition-point strategies and demonstrate their effectiveness across multiple memorization scenarios, including LAION-100k finetuning, data duplication, and trigger-token prompts, while maintaining image quality and textual alignment. The proposed method is fast, requires no prompt or weight changes, and generalizes beyond individual scenarios, offering a practical, broadly applicable solution to memorization in diffusion models. Overall, it contributes a dynamical-systems perspective and a lightweight, robust mitigation that can be readily integrated into existing diffusion pipelines.

Abstract

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel perspective on the memorization phenomenon and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, opposite guidance, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.

Paper Structure

This paper contains 34 sections, 10 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: The diffusion trajectory contains an attraction basin (red region) which steers conditioned samples towards their memorized images. It can be avoided by applying zero classifier-free guidance when the trajectory is inside the attraction basin, such that there is an ideal transition point $\tau^*$ after which applying CFG leads to non-memorized output. Applying CFG at an earlier point such as $\tau_1$ inside the attraction basin leads to the same memorized sample.
  • Figure 2: Plots showing magnitude of $\epsilon_{\theta}(x_t, e_p) - \epsilon_{\theta}(x_t, e_{\emptyset})$ when denoising without classifier-free guidance (CFG) at each time step. The figures show the generated image if you start applying CFG at that time step. We get non-memorized output if we apply CFG after the ideal transition point $\tau^*$ which coincides with the fall in the conditional noise prediction. This value is dependent on both the prompt and the initialization ((a) and (b) contain the same prompt with different initializations). More examples in the Appendix.
  • Figure 3: Average magnitude of the text-conditioned ($\epsilon_{\theta}(x_t, e_{p})$) and unconditional noise predictions ($\epsilon_{\theta}(x_t, e_{\emptyset})$) and their difference ($\epsilon_{\theta}(x_t, e_{p}) - \epsilon_{\theta}(x_t, e_{\emptyset})$) when applying zero CFG. We see a static transition point ($t=500$) appear when SDv2.1 is finetuned on the LAION-10k dataset somepalli2023understanding_neurips.
  • Figure 4: Applying CFG before the static transition point (T=500) leads to memorized outputs while applying CFG after the fixed transition point leads to non-memorized outputs. Applying CFG too late results in poor-quality images that resemble the unconditional generations.
  • Figure 5: In some models, transition points can occur at a different time step for each sample, as seen for pre-trained SDv1.4. In row 1 the transition point is approximately $t=800$ while for row 2 it is $t=650$.
  • ...and 12 more figures

Theorems & Definitions (3)

  • definition 1: Denoiser
  • definition 2: Attractor and attraction basin
  • definition 3: Transition point