Table of Contents
Fetching ...

Synthetic Data: Methods, Use Cases, and Risks

Emiliano De Cristofaro

TL;DR

An introduction to synthetic data is provided and its use cases are discussed, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology are discussed.

Abstract

Sharing data can often enable compelling applications and analytics. However, more often than not, valuable datasets contain information of a sensitive nature, and thus, sharing them can endanger the privacy of users and organizations. A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. The idea is to release artificially generated datasets that resemble the actual data -- more precisely, having similar statistical properties. In this article, we provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.

Synthetic Data: Methods, Use Cases, and Risks

TL;DR

An introduction to synthetic data is provided and its use cases are discussed, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology are discussed.

Abstract

Sharing data can often enable compelling applications and analytics. However, more often than not, valuable datasets contain information of a sensitive nature, and thus, sharing them can endanger the privacy of users and organizations. A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. The idea is to release artificially generated datasets that resemble the actual data -- more precisely, having similar statistical properties. In this article, we provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
Paper Structure (9 sections, 5 figures)

This paper contains 9 sections, 5 figures.

Figures (5)

  • Figure 1: Discriminative Machine Learning Models. (Source: "https://learning.oreilly.com/library/view/generative-deep-learning/9781492041931/" (CC BY 4.0).
  • Figure 2: Generative Machine Learning Models. (Source: "https://learning.oreilly.com/library/view/generative-deep-learning/9781492041931/" CC BY 4.0).
  • Figure 3: Generative GAN-generated, artificial images. (Source: nvidia)
  • Figure 4: Membership Inference Attack (Source: brad)
  • Figure 5: Attribute Inference Attack.