Table of Contents
Fetching ...

Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints

Subhodip Panda, MS Varun, Shreyans Jain, Sarthak Kumar Maharana, Prathosh A. P

TL;DR

This paper tackles safe diffusion-model deployment under data constraints by formulating unlearning as variational divergence minimization in parameter space. It introduces Variational Diffusion Unlearning (VDU), a two-term loss combining a plasticity inducer that suppresses undesired data and a stability regularizer that preserves generation quality, balanced by $\gamma$. Theoretical development connects unlearning to a variational posterior, yielding an interpretable objective $\mathcal{L}_{VDU}=A+B$ with a concrete $A$-type likelihood term and a $B$-type quadratic regularizer, then validates the method on class and feature unlearning tasks across MNIST, CIFAR-10, tiny ImageNet, and LAION-5B–Stable Diffusion, outperforming or matching strong baselines in both unlearning efficacy (PUL) and image quality (u-FID). The work demonstrates a practical, data-efficient approach for regulating diffusion outputs when access to the full training set is restricted, with clear avenues for extending the theory to function-space inference and more general posterior families.

Abstract

For a responsible and safe deployment of diffusion models in various domains, regulating the generated outputs from these models is desirable because such models could generate undesired, violent, and obscene outputs. To tackle this problem, recent works use machine unlearning methodology to forget training data points containing these undesired features from pre-trained generative models. However, these methods proved to be ineffective in data-constrained settings where the whole training dataset is inaccessible. Thus, the principal objective of this work is to propose a machine unlearning methodology that can prevent the generation of outputs containing undesired features from a pre-trained diffusion model in such a data-constrained setting. Our proposed method, termed as Variational Diffusion Unlearning (VDU), is a computationally efficient method that only requires access to a subset of training data containing undesired features. Our approach is inspired by the variational inference framework with the objective of minimizing a loss function consisting of two terms: plasticity inducer and stability regularizer. Plasticity inducer reduces the log-likelihood of the undesired training data points, while the stability regularizer, essential for preventing loss of image generation quality, regularizes the model in parameter space. We validate the effectiveness of our method through comprehensive experiments for both class unlearning and feature unlearning. For class unlearning, we unlearn some user-identified classes from MNIST, CIFAR-10, and tinyImageNet datasets from a pre-trained unconditional denoising diffusion probabilistic model (DDPM). Similarly, for feature unlearning, we unlearn the generation of certain high-level features from a pre-trained Stable Diffusion model

Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints

TL;DR

This paper tackles safe diffusion-model deployment under data constraints by formulating unlearning as variational divergence minimization in parameter space. It introduces Variational Diffusion Unlearning (VDU), a two-term loss combining a plasticity inducer that suppresses undesired data and a stability regularizer that preserves generation quality, balanced by . Theoretical development connects unlearning to a variational posterior, yielding an interpretable objective with a concrete -type likelihood term and a -type quadratic regularizer, then validates the method on class and feature unlearning tasks across MNIST, CIFAR-10, tiny ImageNet, and LAION-5B–Stable Diffusion, outperforming or matching strong baselines in both unlearning efficacy (PUL) and image quality (u-FID). The work demonstrates a practical, data-efficient approach for regulating diffusion outputs when access to the full training set is restricted, with clear avenues for extending the theory to function-space inference and more general posterior families.

Abstract

For a responsible and safe deployment of diffusion models in various domains, regulating the generated outputs from these models is desirable because such models could generate undesired, violent, and obscene outputs. To tackle this problem, recent works use machine unlearning methodology to forget training data points containing these undesired features from pre-trained generative models. However, these methods proved to be ineffective in data-constrained settings where the whole training dataset is inaccessible. Thus, the principal objective of this work is to propose a machine unlearning methodology that can prevent the generation of outputs containing undesired features from a pre-trained diffusion model in such a data-constrained setting. Our proposed method, termed as Variational Diffusion Unlearning (VDU), is a computationally efficient method that only requires access to a subset of training data containing undesired features. Our approach is inspired by the variational inference framework with the objective of minimizing a loss function consisting of two terms: plasticity inducer and stability regularizer. Plasticity inducer reduces the log-likelihood of the undesired training data points, while the stability regularizer, essential for preventing loss of image generation quality, regularizes the model in parameter space. We validate the effectiveness of our method through comprehensive experiments for both class unlearning and feature unlearning. For class unlearning, we unlearn some user-identified classes from MNIST, CIFAR-10, and tinyImageNet datasets from a pre-trained unconditional denoising diffusion probabilistic model (DDPM). Similarly, for feature unlearning, we unlearn the generation of certain high-level features from a pre-trained Stable Diffusion model

Paper Structure

This paper contains 38 sections, 4 theorems, 19 equations, 11 figures, 7 tables, 1 algorithm.

Key Result

Lemma 3.1

The log-likelihood under the backward diffusion process kernel,

Figures (11)

  • Figure 1: (a), (c), and (e) show the original images generated by a pre-trained DDPM model on the MNIST, CIFAR-10, and tinyImageNet datasets, respectively. (b), (d) and (f) display the corresponding images generated after unlearning, using our method VDU. The same noise vectors used to generate the original images were applied in the unlearned model to generate the unlearned images. The latter images show the performance of VDU for unlearning 'Van Gogh' style from a Stable Diffusion model. It is shown that the model slowly unlearns this artistic style feature over multiple epochs. VDU delivers good-quality images after unlearning, as well. Additional results on class-unlearning and feature-unlearning can be found in Figure \ref{['figure-3']}, Figure \ref{['figure-7']}, and Figure \ref{['figure-8']}
  • Figure 2: Variational Diffusion Unlearning (VDU): Given user-identified samples to be unlearned ($D_f$), the proposed VDU finetunes the initial model based on a two-term loss function. The first term, the Plasticity Inducer (shown in the bottom half), minimizes the log-likelihood for the unlearned samples to eliminate their influence, while the second term, the Stability Regularizer (shown in the upper half), preserves the model’s overall performance on the remaining data.
  • Figure 2: Impact of $\gamma$ on the unlearning performance of VDU.
  • Figure 3: Generated samples from the pre-trained model and different unlearned models. (b),(d),(f),(h),(j) & (l) are generated using the same noise vectors from unlearned unconditional DDPM on MNIST, CIFAR-10, and tinyImageNet datasets. (m)--(p) show Stable Diffusion samples before and after unlearning the 'Van Gogh style' and 'Car object' using prompts from LAION-5B. More results are provided in Figure \ref{['figure-7']} and Figure \ref{['figure-8']} in Appendix.
  • Figure 4: Results of VDU on unlearning 'Van Gogh style' and 'Car object' over multiple epochs
  • ...and 6 more figures

Theorems & Definitions (7)

  • Lemma 3.1
  • Lemma 3.2
  • Theorem 3.3
  • proof
  • Remark 3.4
  • Lemma A.1
  • proof