Table of Contents
Fetching ...

PriSampler: Mitigating Property Inference of Diffusion Models

Hailong Hu, Jun Pang

TL;DR

Diffusion models trained on sensitive data pose privacy risks through property inference attacks that can succeed even when adversaries only observe synthetic samples. The authors systematically evaluate such attacks on tabular and image data and introduce PriSampler, a model-agnostic, plug-in defense that steers sampling to conceal property proportions without retraining. Empirical results show diffusion models are vulnerable across samplers and data types, while PriSampler effectively mitigates leakage with minimal utility loss and often outperforms differentially private diffusion models. This work highlights practical privacy risks in synthetic data release and provides a concrete defense to enable privacy-preserving diffusion-based generation.

Abstract

Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its training set from a diffusion model. Specifically, we focus on the most practical attack scenario: adversaries are restricted to accessing only synthetic data. Under this realistic scenario, we conduct a comprehensive evaluation of property inference attacks on various diffusion models trained on diverse data types, including tabular and image datasets. A broad range of evaluations reveals that diffusion models and their samplers are universally vulnerable to property inference attacks. In response, we propose a new model-agnostic plug-in method PriSampler to mitigate the risks of the property inference of diffusion models. PriSampler can be directly applied to well-trained diffusion models and support both stochastic and deterministic sampling. Extensive experiments illustrate the effectiveness of our defense, and it can lead adversaries to infer the proportion of properties as close as predefined values that model owners wish. Notably, PriSampler also shows its significantly superior performance to diffusion models trained with differential privacy on both model utility and defense performance. This work will elevate the awareness of preventing property inference attacks and encourage privacy-preserving synthetic data release.

PriSampler: Mitigating Property Inference of Diffusion Models

TL;DR

Diffusion models trained on sensitive data pose privacy risks through property inference attacks that can succeed even when adversaries only observe synthetic samples. The authors systematically evaluate such attacks on tabular and image data and introduce PriSampler, a model-agnostic, plug-in defense that steers sampling to conceal property proportions without retraining. Empirical results show diffusion models are vulnerable across samplers and data types, while PriSampler effectively mitigates leakage with minimal utility loss and often outperforms differentially private diffusion models. This work highlights practical privacy risks in synthetic data release and provides a concrete defense to enable privacy-preserving diffusion-based generation.

Abstract

Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its training set from a diffusion model. Specifically, we focus on the most practical attack scenario: adversaries are restricted to accessing only synthetic data. Under this realistic scenario, we conduct a comprehensive evaluation of property inference attacks on various diffusion models trained on diverse data types, including tabular and image datasets. A broad range of evaluations reveals that diffusion models and their samplers are universally vulnerable to property inference attacks. In response, we propose a new model-agnostic plug-in method PriSampler to mitigate the risks of the property inference of diffusion models. PriSampler can be directly applied to well-trained diffusion models and support both stochastic and deterministic sampling. Extensive experiments illustrate the effectiveness of our defense, and it can lead adversaries to infer the proportion of properties as close as predefined values that model owners wish. Notably, PriSampler also shows its significantly superior performance to diffusion models trained with differential privacy on both model utility and defense performance. This work will elevate the awareness of preventing property inference attacks and encourage privacy-preserving synthetic data release.
Paper Structure (28 sections, 6 equations, 20 figures, 12 tables)

This paper contains 28 sections, 6 equations, 20 figures, 12 tables.

Figures (20)

  • Figure 1: The attack process of the property inference attack.
  • Figure 2: Attack performance to the number of generated samples. The target model is TabDDPM trained on Adult.
  • Figure 3: Attack performance on different diffusion models, different samplers, and different proportions of the sensitive property. Here, the sensitive property is Gender=Male. Quantitative attack results are shown in Table \ref{['tab:att_perf']} in Appendix.
  • Figure 4: Attack performance on different properties across different diffusion models and samplers in image generation.
  • Figure 5: Attack performance with regard to the number of generated samples and FID scores.
  • ...and 15 more figures