Table of Contents
Fetching ...

Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models

Hyeonggeun Han, Sehwan Kim, Hyungjun Joo, Sangwoo Hong, Jungwoo Lee

TL;DR

Memorization in text-to-image diffusion models raises privacy and copyright concerns by enabling regeneration of training data. The authors show that the starting noise x_T influences when the denoising trajectory escapes the attraction basin, where CFG can bias outputs toward memorized content, and propose two inference-time strategies to promote earlier basin escape without sacrificing prompt alignment. They introduce Batch-wise and Per-sample initial-noise adjustments that reduce the initial conditional guidance magnitude and enable CFG to be applied earlier, yielding non-memorized yet well-aligned images. Experiments on Stable Diffusion v1.4 and v2.0 demonstrate improved memorization mitigation with competitive image-text alignment and diversity, with Per-sample mitigation delivering the strongest SSCD–CLIP trade-off. Overall, the work offers practical, inference-time privacy-preserving techniques for diffusion models with broad applicability across prompts and models.

Abstract

Despite their impressive generative capabilities, text-to-image diffusion models often memorize and replicate training data, prompting serious concerns over privacy and copyright. Recent work has attributed this memorization to an attraction basin-a region where applying classifier-free guidance (CFG) steers the denoising trajectory toward memorized outputs-and has proposed deferring CFG application until the denoising trajectory escapes this basin. However, such delays often result in non-memorized images that are poorly aligned with the input prompts, highlighting the need to promote earlier escape so that CFG can be applied sooner in the denoising process. In this work, we show that the initial noise sample plays a crucial role in determining when this escape occurs. We empirically observe that different initial samples lead to varying escape times. Building on this insight, we propose two mitigation strategies that adjust the initial noise-either collectively or individually-to find and utilize initial samples that encourage earlier basin escape. These approaches significantly reduce memorization while preserving image-text alignment.

Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models

TL;DR

Memorization in text-to-image diffusion models raises privacy and copyright concerns by enabling regeneration of training data. The authors show that the starting noise x_T influences when the denoising trajectory escapes the attraction basin, where CFG can bias outputs toward memorized content, and propose two inference-time strategies to promote earlier basin escape without sacrificing prompt alignment. They introduce Batch-wise and Per-sample initial-noise adjustments that reduce the initial conditional guidance magnitude and enable CFG to be applied earlier, yielding non-memorized yet well-aligned images. Experiments on Stable Diffusion v1.4 and v2.0 demonstrate improved memorization mitigation with competitive image-text alignment and diversity, with Per-sample mitigation delivering the strongest SSCD–CLIP trade-off. Overall, the work offers practical, inference-time privacy-preserving techniques for diffusion models with broad applicability across prompts and models.

Abstract

Despite their impressive generative capabilities, text-to-image diffusion models often memorize and replicate training data, prompting serious concerns over privacy and copyright. Recent work has attributed this memorization to an attraction basin-a region where applying classifier-free guidance (CFG) steers the denoising trajectory toward memorized outputs-and has proposed deferring CFG application until the denoising trajectory escapes this basin. However, such delays often result in non-memorized images that are poorly aligned with the input prompts, highlighting the need to promote earlier escape so that CFG can be applied sooner in the denoising process. In this work, we show that the initial noise sample plays a crucial role in determining when this escape occurs. We empirically observe that different initial samples lead to varying escape times. Building on this insight, we propose two mitigation strategies that adjust the initial noise-either collectively or individually-to find and utilize initial samples that encourage earlier basin escape. These approaches significantly reduce memorization while preserving image-text alignment.

Paper Structure

This paper contains 39 sections, 13 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: The magnitude of $\tilde{\epsilon}_\theta(x_t, t, y)$ at each timestep during sampling without CFG for a memorized prompt. Each image above corresponds to an output generated when CFG is applied over the timestep interval indicated by the associated square brackets.
  • Figure 2: The magnitude of the conditional noise prediction at each timestep during sampling without CFG for three memorized prompts. Each line color corresponds to a different initial Gaussian sample. Transition points occur at different timesteps depending on the choice of initial sample.
  • Figure 4: Comparison of SSCD and CLIP scores with and without initial sample adjustment.
  • Figure 5: Comparison of SSCD and CLIP scores among different mitigation methods under Stable Diffusion v1.4 and v2.0. Lower SSCD scores indicate stronger memorization mitigation, while higher CLIP scores indicate better image-text alignment.
  • Figure 6: Qualitative comparison of memorization mitigation results. Each column shows generations produced by various baseline methods and our proposed approaches. The prompts used for the generations are provided in \ref{['apdx:prompts_used_in_qual']}.
  • ...and 6 more figures