Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective
Xinjian Luo, Yangfan Jiang, Fei Wei, Yuncheng Wu, Xiaokui Xiao, Beng Chin Ooi
TL;DR
The paper tackles privacy and fairness risks in sharing pre-trained diffusion models by introducing a dual-adversary scenario: a sharer performing fairness poisoning to bias downstream classifiers and a receiver conducting property inference to reveal sensitive dataset distributions, all under a black-box diffusion-model sharing setting. It leverages the diffusion models' distribution coverage to propagate training-data biases into synthetic samples and to enable distribution-based inferences, and provides a mutual-information optimization framework and a greedy sampling algorithm for FPA, along with two attacker strategies (with and without auxiliary data) for PIA, including Hoeffding-based error bounds. Empirical results across image and tabular datasets (e.g., CelebA, MNIST, AFAD, Adult) show that FPA can degrade fairness with minimal accuracy loss (often under 5%), while PIA can accurately estimate protected-property proportions with relatively small sample sizes, outperforming non-diffusion baselines. The work highlights both the feasibility and risks of diffusion-model sharing, underscoring the need for auditing, defenses, and principled data-sharing protocols to protect privacy and fairness in real-world deployments.
Abstract
Diffusion models have recently gained significant attention in both academia and industry due to their impressive generative performance in terms of both sampling quality and distribution coverage. Accordingly, proposals are made for sharing pre-trained diffusion models across different organizations, as a way of improving data utilization while enhancing privacy protection by avoiding sharing private data directly. However, the potential risks associated with such an approach have not been comprehensively examined. In this paper, we take an adversarial perspective to investigate the potential privacy and fairness risks associated with the sharing of diffusion models. Specifically, we investigate the circumstances in which one party (the sharer) trains a diffusion model using private data and provides another party (the receiver) black-box access to the pre-trained model for downstream tasks. We demonstrate that the sharer can execute fairness poisoning attacks to undermine the receiver's downstream models by manipulating the training data distribution of the diffusion model. Meanwhile, the receiver can perform property inference attacks to reveal the distribution of sensitive features in the sharer's dataset. Our experiments conducted on real-world datasets demonstrate remarkable attack performance on different types of diffusion models, which highlights the critical importance of robust data auditing and privacy protection protocols in pertinent applications.
