Differentially Private Data Generative Models
Qingrong Chen, Chong Xiang, Minhui Xue, Bo Li, Nikita Borisov, Dali Kaarfar, Haojin Zhu
TL;DR
The paper tackles privacy risks in learning from large private data by introducing two differentially private data generative models, DP-AuGM and DP-VaeGM, capable of synthesizing data with privacy guarantees for downstream learning. DP-AuGM trains a DP autoencoder and publishes its encoder to generate data from public inputs, while DP-VaeGM trains per-class DP VAEs with latent Gaussian sampling to produce unlimited synthetic data. The authors demonstrate strong utility on multiple datasets and show that DP-AuGM resists model inversion, membership inference, and GAN-based attacks, with DP-VaeGM resistant to membership inference; both can be integrated with MLaaS and federated learning to preserve privacy in practice. Overall, the work shows that differentially private data generation can enable high-utility learning while mitigating contemporary privacy threats in real-world systems.
Abstract
Deep neural networks (DNNs) have recently been widely adopted in various applications, and such success is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and access to a large amount of data. However, the large-scale data collections required for deep learning often contain sensitive information, therefore raising many privacy concerns. Prior research has shown several successful attacks in inferring sensitive training data information, such as model inversion, membership inference, and generative adversarial networks (GAN) based leakage attacks against collaborative deep learning. In this paper, to enable learning efficiency as well as to generate data with privacy guarantees and high utility, we propose a differentially private autoencoder-based generative model (DP-AuGM) and a differentially private variational autoencoder-based generative model (DP-VaeGM). We evaluate the robustness of two proposed models. We show that DP-AuGM can effectively defend against the model inversion, membership inference, and GAN-based attacks. We also show that DP-VaeGM is robust against the membership inference attack. We conjecture that the key to defend against the model inversion and GAN-based attacks is not due to differential privacy but the perturbation of training data. Finally, we demonstrate that both DP-AuGM and DP-VaeGM can be easily integrated with real-world machine learning applications, such as machine learning as a service and federated learning, which are otherwise threatened by the membership inference attack and the GAN-based attack, respectively.
