Table of Contents
Fetching ...

CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion

Joshua Kazdan, Hao Sun, Jiaqi Han, Felix Petersen, Stefano Ermon

TL;DR

CPSample is presented, a method that modifies the sampling process to prevent training data replication while preserving image quality and provides diffusion models with greater robustness against membership inference attacks, wherein an adversary attempts to discern which images were in the model's training dataset.

Abstract

Diffusion models have a tendency to exactly replicate their training data, especially when trained on small datasets. Most prior work has sought to mitigate this problem by imposing differential privacy constraints or masking parts of the training data, resulting in a notable substantial decrease in image quality. We present CPSample, a method that modifies the sampling process to prevent training data replication while preserving image quality. CPSample utilizes a classifier that is trained to overfit on random binary labels attached to the training data. CPSample then uses classifier guidance to steer the generation process away from the set of points that can be classified with high certainty, a set that includes the training data. CPSample achieves FID scores of 4.97 and 2.97 on CIFAR-10 and CelebA-64, respectively, without producing exact replicates of the training data. Unlike prior methods intended to guard the training images, CPSample only requires training a classifier rather than retraining a diffusion model, which is computationally cheaper. Moreover, our technique provides diffusion models with greater robustness against membership inference attacks, wherein an adversary attempts to discern which images were in the model's training dataset. We show that CPSample behaves like a built-in rejection sampler, and we demonstrate its capabilities to prevent mode collapse in Stable Diffusion.

CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion

TL;DR

CPSample is presented, a method that modifies the sampling process to prevent training data replication while preserving image quality and provides diffusion models with greater robustness against membership inference attacks, wherein an adversary attempts to discern which images were in the model's training dataset.

Abstract

Diffusion models have a tendency to exactly replicate their training data, especially when trained on small datasets. Most prior work has sought to mitigate this problem by imposing differential privacy constraints or masking parts of the training data, resulting in a notable substantial decrease in image quality. We present CPSample, a method that modifies the sampling process to prevent training data replication while preserving image quality. CPSample utilizes a classifier that is trained to overfit on random binary labels attached to the training data. CPSample then uses classifier guidance to steer the generation process away from the set of points that can be classified with high certainty, a set that includes the training data. CPSample achieves FID scores of 4.97 and 2.97 on CIFAR-10 and CelebA-64, respectively, without producing exact replicates of the training data. Unlike prior methods intended to guard the training images, CPSample only requires training a classifier rather than retraining a diffusion model, which is computationally cheaper. Moreover, our technique provides diffusion models with greater robustness against membership inference attacks, wherein an adversary attempts to discern which images were in the model's training dataset. We show that CPSample behaves like a built-in rejection sampler, and we demonstrate its capabilities to prevent mode collapse in Stable Diffusion.
Paper Structure (29 sections, 1 theorem, 20 equations, 15 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 1 theorem, 20 equations, 15 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

Under the above assumptions, choose $\epsilon>0$ and $0< \delta < \frac{\frac{1}{2}- \kappa}{L}$. Setting $\nu = \epsilon$ and $\lambda = \kappa + L\delta$, when drawing a single sample, with probability greater than $(1-\epsilon)(1-\gamma)$, CPSample generates an image that lies outside of $S= \big

Figures (15)

  • Figure 1: Generated image and most similar training image pairs for DDIM sampling (left) and CPSample with $\alpha{=}0.001$, $s{=}1000$ (right). We sample $100$ images and display the four with the highest similarity to their nearest neighbors in the training data.
  • Figure 2: Generated image and most similar training image pairs for DDIM sampling and CPSample with $\alpha = 0.001, s=1$ on CIFAR-10 (left) and $\alpha = 0.1, s = 10$ on LSUN Church (right). For each pair, the image on the left is the generated sample and the one on the right is its nearest neighbor in the training set. These are the four examples out of $21\,000$ images on CIFAR-10 and two out of $1\,700$ images on LSUN Church that have the highest similarity scores with their nearest neighbor.
  • Figure 3: Cosine similarity in feature space between generated images and their nearest neighbor in the fine-tuning data set for standard DDIM sampling (red) and CPSample (blue) on CIFAR-10 ($\alpha = 0.001, s=1$) and CelebA-64 $(\alpha =0.001, s = 1000)$. Similarity scores were computed for $21\,000$ generated samples for CIFAR-10 and $8\,000$ images for CelebA. Note that standard DDIM exhibits many more samples with similarity scores exceeding the thresholds from Table \ref{['similarity_numeric']}.
  • Figure 4: The generated and real images with the highest similarity for CIFAR-10 (left) and CelebA (right) out of $50\,000$ samples used to compute FID score.
  • Figure 5: Selected examples for Stable Diffusion: original image (left), image generated from a similar caption by Stable Diffusion v1.4 (center), image generated with CPSample (right).
  • ...and 10 more figures

Theorems & Definitions (3)

  • Definition 2.1: ($\epsilon$-$\delta$)-Differential privacy
  • Lemma 1
  • proof