Table of Contents
Fetching ...

Model Will Tell: Training Membership Inference for Diffusion Models

Xiaomeng Fu, Xi Wang, Qiao Li, Jin Liu, Jiao Dai, Jizhong Han

TL;DR

A novel perspective for the TMI task is explored by leveraging the intrinsic generative priors within the diffusion model, and the Degrade Restore Compare (DRC) framework is proposed, which provides comprehensible decision criteria, offering evidence for potential privacy violations.

Abstract

Diffusion models pose risks of privacy breaches and copyright disputes, primarily stemming from the potential utilization of unauthorized data during the training phase. The Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model, representing a critical tool for privacy violation verification. However, the increased stochasticity inherent in diffusion renders traditional shadow-model-based or metric-based methods ineffective when applied to diffusion models. Moreover, existing methods only yield binary classification labels which lack necessary comprehensibility in practical applications. In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model. Compared with unseen samples, training samples exhibit stronger generative priors within the diffusion model, enabling the successful reconstruction of substantially degraded training images. Consequently, we propose the Degrade Restore Compare (DRC) framework. In this framework, an image undergoes sequential degradation and restoration, and its membership is determined by comparing it with the restored counterpart. Experimental results verify that our approach not only significantly outperforms existing methods in terms of accuracy but also provides comprehensible decision criteria, offering evidence for potential privacy violations.

Model Will Tell: Training Membership Inference for Diffusion Models

TL;DR

A novel perspective for the TMI task is explored by leveraging the intrinsic generative priors within the diffusion model, and the Degrade Restore Compare (DRC) framework is proposed, which provides comprehensible decision criteria, offering evidence for potential privacy violations.

Abstract

Diffusion models pose risks of privacy breaches and copyright disputes, primarily stemming from the potential utilization of unauthorized data during the training phase. The Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model, representing a critical tool for privacy violation verification. However, the increased stochasticity inherent in diffusion renders traditional shadow-model-based or metric-based methods ineffective when applied to diffusion models. Moreover, existing methods only yield binary classification labels which lack necessary comprehensibility in practical applications. In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model. Compared with unseen samples, training samples exhibit stronger generative priors within the diffusion model, enabling the successful reconstruction of substantially degraded training images. Consequently, we propose the Degrade Restore Compare (DRC) framework. In this framework, an image undergoes sequential degradation and restoration, and its membership is determined by comparing it with the restored counterpart. Experimental results verify that our approach not only significantly outperforms existing methods in terms of accuracy but also provides comprehensible decision criteria, offering evidence for potential privacy violations.
Paper Structure (13 sections, 9 equations, 6 figures, 7 tables)

This paper contains 13 sections, 9 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: (a) The intuition of DRC. A training sample will exhibit a more pronounced peak compared with an unseen sample. The training sample and the unseen sample are equally degraded. The degraded training sample can be restored to its original peak, while the capability is not shared by the unseen sample. (b) The pipeline of DRC. The original image is first intentionally degraded, and then restored by the diffusion model. Finally, a comparison is made between the original image and its restored counterpart to determine whether the original image is a training member.
  • Figure 2: The framework of our proposed DRC. Given the original image record $x$, we first obtain a partially degraded image $x^D$ by degrading the face area of the original image $x$. We then exploit the generative priors of the diffusion model $\epsilon_{\theta}$ to restore $x^D$, acquiring a restored image $\Tilde{x}$. Finally, we compare the original image $x$ and its restored image $\Tilde{x}$ and compute a membership score to define whether $x$ is in the training set.
  • Figure 3: The log scaled ROC curves on Cifar10, Cifar100, CelebA and FFHQ datasets. The log scaled ROC provides compelling evidence that our method can substantially perform training membership inference with high prediction confidence.
  • Figure 4: We design an aggregation based on the agree numbers of different restoration tasks. We plot the Precision and Recall in different agree numbers in FFHQ.
  • Figure 5: The degraded and restored images of our method in different mask ratios. We show two members and non-members with the mask ratio varying from 0.1 to 0.5. More results can be found in supplementary materials.
  • ...and 1 more figures