Differentially Private Generative Adversarial Network
Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, Jiayu Zhou
TL;DR
This work tackles privacy risks in generative modeling by introducing DP-GAN (DPGAN), which imposes differential privacy during GAN training via gradient clipping and Gaussian noise within a Wasserstein GAN framework. The authors formalize DP guarantees using the moment accountant and show that the generator output remains differentially private despite unlimited data generation through post-processing. Empirical evaluations on MNIST and MIMIC-III EHR data demonstrate the expected privacy-utility trade-off: higher privacy budgets yield higher-quality samples and better downstream classifier performance, while tighter privacy introduces more blur and distributional distortion. The approach enables private data sharing and synthetic data generation with formal privacy protection, applicable to both fully connected and CNN architectures.
Abstract
Generative Adversarial Network (GAN) and its variants have recently attracted intensive research interests due to their elegant theoretical foundation and excellent empirical performance as generative models. These tools provide a promising direction in the studies where data availability is limited. One common issue in GANs is that the density of the learned generative distribution could concentrate on the training data points, meaning that they can easily remember training samples due to the high model complexity of deep networks. This becomes a major concern when GANs are applied to private or sensitive data such as patient medical records, and the concentration of distribution may divulge critical patient information. To address this issue, in this paper we propose a differentially private GAN (DPGAN) model, in which we achieve differential privacy in GANs by adding carefully designed noise to gradients during the learning procedure. We provide rigorous proof for the privacy guarantee, as well as comprehensive empirical evidence to support our analysis, where we demonstrate that our method can generate high quality data points at a reasonable privacy level.
