Table of Contents
Fetching ...

Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective

Xinjian Luo, Yangfan Jiang, Fei Wei, Yuncheng Wu, Xiaokui Xiao, Beng Chin Ooi

TL;DR

The paper tackles privacy and fairness risks in sharing pre-trained diffusion models by introducing a dual-adversary scenario: a sharer performing fairness poisoning to bias downstream classifiers and a receiver conducting property inference to reveal sensitive dataset distributions, all under a black-box diffusion-model sharing setting. It leverages the diffusion models' distribution coverage to propagate training-data biases into synthetic samples and to enable distribution-based inferences, and provides a mutual-information optimization framework and a greedy sampling algorithm for FPA, along with two attacker strategies (with and without auxiliary data) for PIA, including Hoeffding-based error bounds. Empirical results across image and tabular datasets (e.g., CelebA, MNIST, AFAD, Adult) show that FPA can degrade fairness with minimal accuracy loss (often under 5%), while PIA can accurately estimate protected-property proportions with relatively small sample sizes, outperforming non-diffusion baselines. The work highlights both the feasibility and risks of diffusion-model sharing, underscoring the need for auditing, defenses, and principled data-sharing protocols to protect privacy and fairness in real-world deployments.

Abstract

Diffusion models have recently gained significant attention in both academia and industry due to their impressive generative performance in terms of both sampling quality and distribution coverage. Accordingly, proposals are made for sharing pre-trained diffusion models across different organizations, as a way of improving data utilization while enhancing privacy protection by avoiding sharing private data directly. However, the potential risks associated with such an approach have not been comprehensively examined. In this paper, we take an adversarial perspective to investigate the potential privacy and fairness risks associated with the sharing of diffusion models. Specifically, we investigate the circumstances in which one party (the sharer) trains a diffusion model using private data and provides another party (the receiver) black-box access to the pre-trained model for downstream tasks. We demonstrate that the sharer can execute fairness poisoning attacks to undermine the receiver's downstream models by manipulating the training data distribution of the diffusion model. Meanwhile, the receiver can perform property inference attacks to reveal the distribution of sensitive features in the sharer's dataset. Our experiments conducted on real-world datasets demonstrate remarkable attack performance on different types of diffusion models, which highlights the critical importance of robust data auditing and privacy protection protocols in pertinent applications.

Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective

TL;DR

The paper tackles privacy and fairness risks in sharing pre-trained diffusion models by introducing a dual-adversary scenario: a sharer performing fairness poisoning to bias downstream classifiers and a receiver conducting property inference to reveal sensitive dataset distributions, all under a black-box diffusion-model sharing setting. It leverages the diffusion models' distribution coverage to propagate training-data biases into synthetic samples and to enable distribution-based inferences, and provides a mutual-information optimization framework and a greedy sampling algorithm for FPA, along with two attacker strategies (with and without auxiliary data) for PIA, including Hoeffding-based error bounds. Empirical results across image and tabular datasets (e.g., CelebA, MNIST, AFAD, Adult) show that FPA can degrade fairness with minimal accuracy loss (often under 5%), while PIA can accurately estimate protected-property proportions with relatively small sample sizes, outperforming non-diffusion baselines. The work highlights both the feasibility and risks of diffusion-model sharing, underscoring the need for auditing, defenses, and principled data-sharing protocols to protect privacy and fairness in real-world deployments.

Abstract

Diffusion models have recently gained significant attention in both academia and industry due to their impressive generative performance in terms of both sampling quality and distribution coverage. Accordingly, proposals are made for sharing pre-trained diffusion models across different organizations, as a way of improving data utilization while enhancing privacy protection by avoiding sharing private data directly. However, the potential risks associated with such an approach have not been comprehensively examined. In this paper, we take an adversarial perspective to investigate the potential privacy and fairness risks associated with the sharing of diffusion models. Specifically, we investigate the circumstances in which one party (the sharer) trains a diffusion model using private data and provides another party (the receiver) black-box access to the pre-trained model for downstream tasks. We demonstrate that the sharer can execute fairness poisoning attacks to undermine the receiver's downstream models by manipulating the training data distribution of the diffusion model. Meanwhile, the receiver can perform property inference attacks to reveal the distribution of sensitive features in the sharer's dataset. Our experiments conducted on real-world datasets demonstrate remarkable attack performance on different types of diffusion models, which highlights the critical importance of robust data auditing and privacy protection protocols in pertinent applications.
Paper Structure (23 sections, 1 theorem, 16 equations, 10 figures, 6 tables, 2 algorithms)

This paper contains 23 sections, 1 theorem, 16 equations, 10 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Let $g_{\text{d}}: \mathcal{X}\rightarrow \mathcal{S}$ used in Algorithm alg-attack-PIA be an unbiased discriminatorNote that an unbiased discriminator can be trained using a variety of techniques, such as meta-learning debial-metalearning, conditional adversarial debiasing debias-adversarial, and r

Figures (10)

  • Figure 1: An example of sharing datasets via pre-trained diffusion models.
  • Figure 2: Overview of the proposed attacks.
  • Figure 3: Use the CLIP model as a property discriminator.
  • Figure 4: Images sampled from different models.
  • Figure 5: The (a)-(d) accuracy loss and (e)-(h) fairness loss caused by fairness poisoning attacks on different datasets.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof