Table of Contents
Fetching ...

Generating Artificial Data for Private Deep Learning

Aleksei Triastcyn, Boi Faltings

TL;DR

This work tackles privacy risks in ML by proposing private data release through GAN-generated artificial data that preserve key statistical properties of real data. A Differentially Private (DP) critic is added to the GAN to improve sample diversity and privacy, while an empirical privacy-estimation framework uses KL divergence and Chebyshev bounds to bound expected privacy loss post hoc. Experiments on MNIST, SVHN, and CelebA show that models trained on artificial data achieve competitive accuracy versus non-private baselines and against DP-based model-release methods, with measurable reductions in information leakage as evidenced by model-inversion attacks. The approach enables flexible, scalable private data publishing and data pooling, though it provides empirical rather than worst-case DP guarantees and faces typical GAN limitations. This suggests a practical path toward privacy-preserving data sharing, data markets, and reproducible research with high-utility synthetic data and interpretable privacy bounds.

Abstract

In this paper, we propose generating artificial data that retain statistical properties of real data as the means of providing privacy with respect to the original dataset. We use generative adversarial network to draw privacy-preserving artificial data samples and derive an empirical method to assess the risk of information disclosure in a differential-privacy-like way. Our experiments show that we are able to generate artificial data of high quality and successfully train and validate machine learning models on this data while limiting potential privacy loss.

Generating Artificial Data for Private Deep Learning

TL;DR

This work tackles privacy risks in ML by proposing private data release through GAN-generated artificial data that preserve key statistical properties of real data. A Differentially Private (DP) critic is added to the GAN to improve sample diversity and privacy, while an empirical privacy-estimation framework uses KL divergence and Chebyshev bounds to bound expected privacy loss post hoc. Experiments on MNIST, SVHN, and CelebA show that models trained on artificial data achieve competitive accuracy versus non-private baselines and against DP-based model-release methods, with measurable reductions in information leakage as evidenced by model-inversion attacks. The approach enables flexible, scalable private data publishing and data pooling, though it provides empirical rather than worst-case DP guarantees and faces typical GAN limitations. This suggests a practical path toward privacy-preserving data sharing, data markets, and reproducible research with high-utility synthetic data and interpretable privacy bounds.

Abstract

In this paper, we propose generating artificial data that retain statistical properties of real data as the means of providing privacy with respect to the original dataset. We use generative adversarial network to draw privacy-preserving artificial data samples and derive an empirical method to assess the risk of information disclosure in a differential-privacy-like way. Our experiments show that we are able to generate artificial data of high quality and successfully train and validate machine learning models on this data while limiting potential privacy loss.

Paper Structure

This paper contains 13 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Architecture of our solution. Sensitive data is used to train a GAN to produce a private artificial dataset, which then can be used by any ML model.
  • Figure 2: Results of the model inversion attack. Top to bottom: real target images, reconstructions from non-private model, our method, and DP model.
  • Figure 3: Privacy-accuracy trade-off curve and corresponding image reconstructions from a multi-layer perceptron trained on artificial MNIST dataset.
  • Figure 4: Generated and closest real examples for SVHN.
  • Figure 5: Generated and closest real examples for CelebA.

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4