Table of Contents
Fetching ...

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

Kyungryeol Lee, Kyeonghyun Lee, Seongmin Hong, Byung Hyun Lee, Se Young Chun

TL;DR

This work introduces an effective surrogate-based unlearning method that leverages image editing, timestep-aware weighting, and gradient surgery to guide trained diffusion models toward forgetting specific outputs, and uniquely unlearns unpromptable outputs with preserved integrity.

Abstract

Machine unlearning aims to remove specific outputs from trained models, often at the concept level, such as forgetting all occurrences of a particular celebrity or filtering content via text prompts. However, many undesired outputs, such as an individual's face or generations culturally or factually misinterpreted, cannot often be specified by text prompts. We address this underexplored setting of instance unlearning for outputs that are undesired but unpromptable, where the goal is to forget target outputs selectively while preserving the rest. To this end, we introduce an effective surrogate-based unlearning method that leverages image editing, timestep-aware weighting, and gradient surgery to guide trained diffusion models toward forgetting specific outputs. Experiments on conditional (Stable Diffusion 3) and unconditional (DDPM-CelebA) diffusion models demonstrate that our prompt-free method uniquely unlearns unpromptable outputs, such as faces and culturally inaccurate depictions, with preserved integrity, unlike prompt-based and prompt-free baselines. Our proposed method would serve as a practical hotfix for diffusion model providers to ensure privacy protection and ethical compliance.

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

TL;DR

This work introduces an effective surrogate-based unlearning method that leverages image editing, timestep-aware weighting, and gradient surgery to guide trained diffusion models toward forgetting specific outputs, and uniquely unlearns unpromptable outputs with preserved integrity.

Abstract

Machine unlearning aims to remove specific outputs from trained models, often at the concept level, such as forgetting all occurrences of a particular celebrity or filtering content via text prompts. However, many undesired outputs, such as an individual's face or generations culturally or factually misinterpreted, cannot often be specified by text prompts. We address this underexplored setting of instance unlearning for outputs that are undesired but unpromptable, where the goal is to forget target outputs selectively while preserving the rest. To this end, we introduce an effective surrogate-based unlearning method that leverages image editing, timestep-aware weighting, and gradient surgery to guide trained diffusion models toward forgetting specific outputs. Experiments on conditional (Stable Diffusion 3) and unconditional (DDPM-CelebA) diffusion models demonstrate that our prompt-free method uniquely unlearns unpromptable outputs, such as faces and culturally inaccurate depictions, with preserved integrity, unlike prompt-based and prompt-free baselines. Our proposed method would serve as a practical hotfix for diffusion model providers to ensure privacy protection and ethical compliance.
Paper Structure (21 sections, 3 theorems, 47 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 3 theorems, 47 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Let $\theta^*\in \mathbb{R}^d$ be the parameter vector obtained by solving the ridge-regression problem: where $X \in \mathbb{R}^{n\times d}$, $y\in \mathbb{R}^n$, and $\lambda \ge 0$. Denote so that Now remove the $i$-th row $\bigl(x_i, y_i\bigr)$ from $X, y$, producing $\widetilde{X}, \widetilde{y}$. The new solution, trained from scratch on $\widetilde{X}, \widetilde{y}$, is Then, the diffe

Figures (11)

  • Figure 1: Challenge and our solution for instance unlearning in diffusion models.
  • Figure 2: Cultural and semantic misrepresentation highlight the need for instance unlearning in commercial generative models.
  • Figure 3: Comparison between (a) prompt-based and (b) instance unlearning.
  • Figure 4: Surrogate-based unlearning ($\theta^\dagger$) can be better than exact unlearing ($\Tilde{\theta}$) in mapping preservation, i.e., the line is closer to the original ($\theta^*$).
  • Figure 5: Surrogate data construction of (left) CelebA with TediGAN, (middle) SD3 with SDEdit, and (right) manual editing.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1: Exact Unlearning, restated from golub1979generalized
  • proof
  • Theorem 2: Surrogate-based Unlearning
  • proof
  • Corollary 3: Comparison
  • proof