Table of Contents
Fetching ...

Model Integrity when Unlearning with T2I Diffusion Models

Andrea Schioppa, Emiel Hoogeboom, Jonathan Heek

TL;DR

A novel retention metric is introduced that directly assesses the perceptual difference between outputs generated by the original and the unlearned models and proposes unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.

Abstract

The rapid advancement of text-to-image Diffusion Models has led to their widespread public accessibility. However these models, trained on large internet datasets, can sometimes generate undesirable outputs. To mitigate this, approximate Machine Unlearning algorithms have been proposed to modify model weights to reduce the generation of specific types of images, characterized by samples from a ``forget distribution'', while preserving the model's ability to generate other images, characterized by samples from a ``retain distribution''. While these methods aim to minimize the influence of training data in the forget distribution without extensive additional computation, we point out that they can compromise the model's integrity by inadvertently affecting generation for images in the retain distribution. Recognizing the limitations of FID and CLIPScore in capturing these effects, we introduce a novel retention metric that directly assesses the perceptual difference between outputs generated by the original and the unlearned models. We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines. Given their straightforward implementation, these algorithms serve as valuable benchmarks for future advancements in approximate Machine Unlearning for Diffusion Models.

Model Integrity when Unlearning with T2I Diffusion Models

TL;DR

A novel retention metric is introduced that directly assesses the perceptual difference between outputs generated by the original and the unlearned models and proposes unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.

Abstract

The rapid advancement of text-to-image Diffusion Models has led to their widespread public accessibility. However these models, trained on large internet datasets, can sometimes generate undesirable outputs. To mitigate this, approximate Machine Unlearning algorithms have been proposed to modify model weights to reduce the generation of specific types of images, characterized by samples from a ``forget distribution'', while preserving the model's ability to generate other images, characterized by samples from a ``retain distribution''. While these methods aim to minimize the influence of training data in the forget distribution without extensive additional computation, we point out that they can compromise the model's integrity by inadvertently affecting generation for images in the retain distribution. Recognizing the limitations of FID and CLIPScore in capturing these effects, we introduce a novel retention metric that directly assesses the perceptual difference between outputs generated by the original and the unlearned models. We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines. Given their straightforward implementation, these algorithms serve as valuable benchmarks for future advancements in approximate Machine Unlearning for Diffusion Models.

Paper Structure

This paper contains 31 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Forgetting Van Gogh affects generation on a different painter, Vermeer. Except for ESD, all models have comparable FID (within $0.5$) and CLIPScore (within $0.2$), see Table \ref{['tab:quant_comp']}, so this effect is not captured by FID or CLIPScore. Our methods preserve generation on Vermeer while forget Van Gogh. Removing the help loss terms $L_{\text{help}}$ from OVW interferes with image generation on Vermeer. Generated images share the random seed on each row. $\text{LPIPS}$ is computed wrt. the reference image generated by the base checkpoint.
  • Figure 2: Qualitative comparison of forgetting (or not) cat across methods. Our methods keep generation closest to the original checkpoint on the frog. SALUN and ESD produce qualitatively very similar results. For each row the same random seed is used across images. $\text{LPIPS}$ is computed wrt. the reference image generated by the base checkpoint. In the case of Saddle, $\text{LPIPS}$ is affected by the appearance of grass under the frog, compare also SA.