Table of Contents
Fetching ...

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi

TL;DR

EraseDiff tackles the challenge of erasing undesired information from diffusion models without sacrificing performance. It recasts data forgetting as a constrained optimization problem via a value-function formulation, yielding a first-order update that balances preservation of remaining data with erasure of forgotten data, and proves Pareto-optimality of the solution. Empirically, EraseDiff achieves faster forgetting with better trade-offs than state-of-the-art baselines across DDPM and Stable Diffusion setups, including class-wise and concept-wise forgetting and Nudity erasure, while maintaining image fidelity (FID) and alignment (CLIP). The work demonstrates practical impact for privacy and safety in generative models, offering a scalable, principled approach that reduces computational cost relative to existing methods. It also highlights limitations and invites future work on fairness and privacy-preserving enhancements.

Abstract

Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content. Although various techniques have been proposed to mitigate unwanted influences in diffusion models while preserving overall performance, achieving a balance between these goals remains challenging. In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten. Our approach formulates this task as a constrained optimization problem using the value function, resulting in a natural first-order algorithm for solving the optimization problem. By altering the generative process to deviate away from the ground-truth denoising trajectory, we update parameters for preservation while controlling constraint reduction to ensure effective erasure, striking an optimal trade-off. Extensive experiments and thorough comparisons with state-of-the-art algorithms demonstrate that EraseDiff effectively preserves the model's utility, efficacy, and efficiency.

Erasing Undesirable Influence in Diffusion Models

TL;DR

EraseDiff tackles the challenge of erasing undesired information from diffusion models without sacrificing performance. It recasts data forgetting as a constrained optimization problem via a value-function formulation, yielding a first-order update that balances preservation of remaining data with erasure of forgotten data, and proves Pareto-optimality of the solution. Empirically, EraseDiff achieves faster forgetting with better trade-offs than state-of-the-art baselines across DDPM and Stable Diffusion setups, including class-wise and concept-wise forgetting and Nudity erasure, while maintaining image fidelity (FID) and alignment (CLIP). The work demonstrates practical impact for privacy and safety in generative models, offering a scalable, principled approach that reduces computational cost relative to existing methods. It also highlights limitations and invites future work on fairness and privacy-preserving enhancements.

Abstract

Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content. Although various techniques have been proposed to mitigate unwanted influences in diffusion models while preserving overall performance, achieving a balance between these goals remains challenging. In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten. Our approach formulates this task as a constrained optimization problem using the value function, resulting in a natural first-order algorithm for solving the optimization problem. By altering the generative process to deviate away from the ground-truth denoising trajectory, we update parameters for preservation while controlling constraint reduction to ensure effective erasure, striking an optimal trade-off. Extensive experiments and thorough comparisons with state-of-the-art algorithms demonstrate that EraseDiff effectively preserves the model's utility, efficacy, and efficiency.
Paper Structure (26 sections, 2 theorems, 9 equations, 19 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 2 theorems, 9 equations, 19 figures, 5 tables, 1 algorithm.

Key Result

Theorem 3.1

The optimal solution of the optimization problem in eq:delta_opt is $\bm{\delta}^{*} = \nabla_{{\bm{\theta}}} {\mathcal{L}}_r({\bm{\theta}}_{t};{\mathcal{D}}_{r}) + \lambda_{t} \nabla_{{\bm{\theta}}} g({\bm{\theta}}_{t})$ where $\lambda_{t} = \operatorname{max} \{0, \frac{a_{t} - \nabla_{{\bm{\theta

Figures (19)

  • Figure 1: Top to Bottom: generated samples by SD v1.4 and model scrubbed by our method, when erasing the concept of 'nudity'. Our method can avoid NSFW (not safe for work) content while preserving model utility.
  • Figure 2: Top to bottom: cosine similarity between the update vector $\bm{\delta}$ and the preservation gradient ${\bm{g}}_r$, followed by the cosine similarity between $\bm{\delta}$ and the erasing gradient ${\bm{g}}_f$. Positive values indicate alignment, while negative values suggest conflict. This visualization illustrates how well the update vector aligns with the objectives of preservation and erasure over successive iterations.
  • Figure 3: Quantity of nudity content detected using the NudeNet classifier from I2P data. Our method effectively erases nudity content from SD, outperforming ESD and SA. Note that \ref{['fig:i2p_change']} and \ref{['tab:nsfw_sd']} together presents the trade-off between erasing and preservation.
  • Figure 4: Top to Bottom: generated examples with I2P and COCO prompts after forgetting the concept of 'nudity'.
  • Figure 5: (a) Ablation results. (b) Potential incomplete erasures.
  • ...and 14 more figures

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2: Pareto optimality
  • proof
  • proof