Table of Contents
Fetching ...

Differentially Private Data Generative Models

Qingrong Chen, Chong Xiang, Minhui Xue, Bo Li, Nikita Borisov, Dali Kaarfar, Haojin Zhu

TL;DR

The paper tackles privacy risks in learning from large private data by introducing two differentially private data generative models, DP-AuGM and DP-VaeGM, capable of synthesizing data with privacy guarantees for downstream learning. DP-AuGM trains a DP autoencoder and publishes its encoder to generate data from public inputs, while DP-VaeGM trains per-class DP VAEs with latent Gaussian sampling to produce unlimited synthetic data. The authors demonstrate strong utility on multiple datasets and show that DP-AuGM resists model inversion, membership inference, and GAN-based attacks, with DP-VaeGM resistant to membership inference; both can be integrated with MLaaS and federated learning to preserve privacy in practice. Overall, the work shows that differentially private data generation can enable high-utility learning while mitigating contemporary privacy threats in real-world systems.

Abstract

Deep neural networks (DNNs) have recently been widely adopted in various applications, and such success is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and access to a large amount of data. However, the large-scale data collections required for deep learning often contain sensitive information, therefore raising many privacy concerns. Prior research has shown several successful attacks in inferring sensitive training data information, such as model inversion, membership inference, and generative adversarial networks (GAN) based leakage attacks against collaborative deep learning. In this paper, to enable learning efficiency as well as to generate data with privacy guarantees and high utility, we propose a differentially private autoencoder-based generative model (DP-AuGM) and a differentially private variational autoencoder-based generative model (DP-VaeGM). We evaluate the robustness of two proposed models. We show that DP-AuGM can effectively defend against the model inversion, membership inference, and GAN-based attacks. We also show that DP-VaeGM is robust against the membership inference attack. We conjecture that the key to defend against the model inversion and GAN-based attacks is not due to differential privacy but the perturbation of training data. Finally, we demonstrate that both DP-AuGM and DP-VaeGM can be easily integrated with real-world machine learning applications, such as machine learning as a service and federated learning, which are otherwise threatened by the membership inference attack and the GAN-based attack, respectively.

Differentially Private Data Generative Models

TL;DR

The paper tackles privacy risks in learning from large private data by introducing two differentially private data generative models, DP-AuGM and DP-VaeGM, capable of synthesizing data with privacy guarantees for downstream learning. DP-AuGM trains a DP autoencoder and publishes its encoder to generate data from public inputs, while DP-VaeGM trains per-class DP VAEs with latent Gaussian sampling to produce unlimited synthetic data. The authors demonstrate strong utility on multiple datasets and show that DP-AuGM resists model inversion, membership inference, and GAN-based attacks, with DP-VaeGM resistant to membership inference; both can be integrated with MLaaS and federated learning to preserve privacy in practice. Overall, the work shows that differentially private data generation can enable high-utility learning while mitigating contemporary privacy threats in real-world systems.

Abstract

Deep neural networks (DNNs) have recently been widely adopted in various applications, and such success is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and access to a large amount of data. However, the large-scale data collections required for deep learning often contain sensitive information, therefore raising many privacy concerns. Prior research has shown several successful attacks in inferring sensitive training data information, such as model inversion, membership inference, and generative adversarial networks (GAN) based leakage attacks against collaborative deep learning. In this paper, to enable learning efficiency as well as to generate data with privacy guarantees and high utility, we propose a differentially private autoencoder-based generative model (DP-AuGM) and a differentially private variational autoencoder-based generative model (DP-VaeGM). We evaluate the robustness of two proposed models. We show that DP-AuGM can effectively defend against the model inversion, membership inference, and GAN-based attacks. We also show that DP-VaeGM is robust against the membership inference attack. We conjecture that the key to defend against the model inversion and GAN-based attacks is not due to differential privacy but the perturbation of training data. Finally, we demonstrate that both DP-AuGM and DP-VaeGM can be easily integrated with real-world machine learning applications, such as machine learning as a service and federated learning, which are otherwise threatened by the membership inference attack and the GAN-based attack, respectively.

Paper Structure

This paper contains 28 sections, 3 theorems, 4 equations, 11 figures, 9 tables.

Key Result

Theorem 1

Let $\mathcal{M}$ denote the differentially private generative model and $\mathcal{X}$ be the private data. Any machine learning model trained over the generated data $\mathcal{M}(\mathcal{X})$, is also differentially private w.r.t. the private data $\mathcal{X}$.

Figures (11)

  • Figure 1: Overview of proposed differentially private data generative models. Sensitive private training data $\mathcal{X}$ is fed into the generative model $\mathcal{M}$ to generate private surrogate dataset $\mathcal{X}^{\prime}$. After publishing $\mathcal{X}^{\prime}$, different learning models can be trained on $\mathcal{X}^{\prime}$ to protect privacy of $\mathcal{X}$ while achieving high learning accuracy (data utility).
  • Figure 2: Evaluation of DP-AuGM
  • Figure 3: DP-AuGM and DP-VaeGM versus DP-DL
  • Figure 4: Accuracy of DP-AuGM by different sizes of public data
  • Figure 5: Accuracy of DP-VaeGM under various privacy budgets on MNIST dataset
  • ...and 6 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2
  • Theorem 3