Table of Contents
Fetching ...

Differentially Private Generative Adversarial Network

Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, Jiayu Zhou

TL;DR

This work tackles privacy risks in generative modeling by introducing DP-GAN (DPGAN), which imposes differential privacy during GAN training via gradient clipping and Gaussian noise within a Wasserstein GAN framework. The authors formalize DP guarantees using the moment accountant and show that the generator output remains differentially private despite unlimited data generation through post-processing. Empirical evaluations on MNIST and MIMIC-III EHR data demonstrate the expected privacy-utility trade-off: higher privacy budgets yield higher-quality samples and better downstream classifier performance, while tighter privacy introduces more blur and distributional distortion. The approach enables private data sharing and synthetic data generation with formal privacy protection, applicable to both fully connected and CNN architectures.

Abstract

Generative Adversarial Network (GAN) and its variants have recently attracted intensive research interests due to their elegant theoretical foundation and excellent empirical performance as generative models. These tools provide a promising direction in the studies where data availability is limited. One common issue in GANs is that the density of the learned generative distribution could concentrate on the training data points, meaning that they can easily remember training samples due to the high model complexity of deep networks. This becomes a major concern when GANs are applied to private or sensitive data such as patient medical records, and the concentration of distribution may divulge critical patient information. To address this issue, in this paper we propose a differentially private GAN (DPGAN) model, in which we achieve differential privacy in GANs by adding carefully designed noise to gradients during the learning procedure. We provide rigorous proof for the privacy guarantee, as well as comprehensive empirical evidence to support our analysis, where we demonstrate that our method can generate high quality data points at a reasonable privacy level.

Differentially Private Generative Adversarial Network

TL;DR

This work tackles privacy risks in generative modeling by introducing DP-GAN (DPGAN), which imposes differential privacy during GAN training via gradient clipping and Gaussian noise within a Wasserstein GAN framework. The authors formalize DP guarantees using the moment accountant and show that the generator output remains differentially private despite unlimited data generation through post-processing. Empirical evaluations on MNIST and MIMIC-III EHR data demonstrate the expected privacy-utility trade-off: higher privacy budgets yield higher-quality samples and better downstream classifier performance, while tighter privacy introduces more blur and distributional distortion. The approach enables private data sharing and synthetic data generation with formal privacy protection, applicable to both fully connected and CNN architectures.

Abstract

Generative Adversarial Network (GAN) and its variants have recently attracted intensive research interests due to their elegant theoretical foundation and excellent empirical performance as generative models. These tools provide a promising direction in the studies where data availability is limited. One common issue in GANs is that the density of the learned generative distribution could concentrate on the training data points, meaning that they can easily remember training samples due to the high model complexity of deep networks. This becomes a major concern when GANs are applied to private or sensitive data such as patient medical records, and the concentration of distribution may divulge critical patient information. To address this issue, in this paper we propose a differentially private GAN (DPGAN) model, in which we achieve differential privacy in GANs by adding carefully designed noise to gradients during the learning procedure. We provide rigorous proof for the privacy guarantee, as well as comprehensive empirical evidence to support our analysis, where we demonstrate that our method can generate high quality data points at a reasonable privacy level.

Paper Structure

This paper contains 14 sections, 3 theorems, 8 equations, 5 figures, 1 algorithm.

Key Result

lemma 1

Under the condition of Alg. alg:DPGAN, assume that the activation function of the discriminator has a bounded range and bounded derivatives everywhere: $\sigma(\cdot) \leq B_{\sigma}$ and $\sigma^{'}(\cdot) \leq B_{\sigma^{'}}$, and every data point $\mathbf{x}$ satisfies $\|\mathbf{x}\| \leq B_{x}$

Figures (5)

  • Figure 1: Generated images with four different $\epsilon$ on MNIST dataset are plotted in leftmost column in each group. Three nearest neighbors of generated images are plotted to illustrate the generated data is not memorizing the real data and the privacy is preserved. We can see that the images get more blurred as more noise is added.
  • Figure 2: Wasserstein distance for different privacy levels when applying DPGAN on MINST. We can see that the curves converge and exhibit more fluctuations as more noise is added.
  • Figure 3: Binary classification task on MNIST database with different training strategies. From left to right we use training data, generated data without noise, generated data with $\epsilon=11.5,3.2,0.96,0.72$. We can see that as less noise is added, the accuracy of classifier build on generated data gets higher, which indicates that the generated data has better quality.
  • Figure 4: DWP evaluation on MIMIC-III database with different $\epsilon$ values (1070 points). We can see that as more noise is added, the distribution of generated data in each dimension becomes more deviated from the real training data.
  • Figure 5: Dimension-wise prediction evaluation on MIMIC-III database with different $\epsilon$ values. We can see that as more noise is added, AUC value of classifier build from generated data gets lower and the data gets sparser.

Theorems & Definitions (8)

  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • lemma 1
  • Remark 1
  • Lemma 1
  • Theorem 1