Table of Contents
Fetching ...

Differentially Private Synthetic Data Generation Using Context-Aware GANs

Anantaa Kotal, Anupam Joshi

TL;DR

The paper tackles privacy-aware synthetic data generation by introducing ContextGAN, a context-aware, differentially private GAN that enforces domain-specific rules via a constraint matrix in the discriminator. By combining explicit/implicit rule enforcement with DP training, ContextGAN delivers high-fidelity synthetic data across healthcare, security, and finance while maintaining strong privacy protections. Empirical results show improved distributional fidelity, competitive downstream utility, and robust resistance to re-identification, attribute inference, and membership inference attacks. This approach offers a practical pathway for privacy-compliant data sharing and analysis in sensitive domains.

Abstract

The widespread use of big data across sectors has raised major privacy concerns, especially when sensitive information is shared or analyzed. Regulations such as GDPR and HIPAA impose strict controls on data handling, making it difficult to balance the need for insights with privacy requirements. Synthetic data offers a promising solution by creating artificial datasets that reflect real patterns without exposing sensitive information. However, traditional synthetic data methods often fail to capture complex, implicit rules that link different elements of the data and are essential in domains like healthcare. They may reproduce explicit patterns but overlook domain-specific constraints that are not directly stated yet crucial for realism and utility. For example, prescription guidelines that restrict certain medications for specific conditions or prevent harmful drug interactions may not appear explicitly in the original data. Synthetic data generated without these implicit rules can lead to medically inappropriate or unrealistic profiles. To address this gap, we propose ContextGAN, a Context-Aware Differentially Private Generative Adversarial Network that integrates domain-specific rules through a constraint matrix encoding both explicit and implicit knowledge. The constraint-aware discriminator evaluates synthetic data against these rules to ensure adherence to domain constraints, while differential privacy protects sensitive details from the original data. We validate ContextGAN across healthcare, security, and finance, showing that it produces high-quality synthetic data that respects domain rules and preserves privacy. Our results demonstrate that ContextGAN improves realism and utility by enforcing domain constraints, making it suitable for applications that require compliance with both explicit patterns and implicit rules under strict privacy guarantees.

Differentially Private Synthetic Data Generation Using Context-Aware GANs

TL;DR

The paper tackles privacy-aware synthetic data generation by introducing ContextGAN, a context-aware, differentially private GAN that enforces domain-specific rules via a constraint matrix in the discriminator. By combining explicit/implicit rule enforcement with DP training, ContextGAN delivers high-fidelity synthetic data across healthcare, security, and finance while maintaining strong privacy protections. Empirical results show improved distributional fidelity, competitive downstream utility, and robust resistance to re-identification, attribute inference, and membership inference attacks. This approach offers a practical pathway for privacy-compliant data sharing and analysis in sensitive domains.

Abstract

The widespread use of big data across sectors has raised major privacy concerns, especially when sensitive information is shared or analyzed. Regulations such as GDPR and HIPAA impose strict controls on data handling, making it difficult to balance the need for insights with privacy requirements. Synthetic data offers a promising solution by creating artificial datasets that reflect real patterns without exposing sensitive information. However, traditional synthetic data methods often fail to capture complex, implicit rules that link different elements of the data and are essential in domains like healthcare. They may reproduce explicit patterns but overlook domain-specific constraints that are not directly stated yet crucial for realism and utility. For example, prescription guidelines that restrict certain medications for specific conditions or prevent harmful drug interactions may not appear explicitly in the original data. Synthetic data generated without these implicit rules can lead to medically inappropriate or unrealistic profiles. To address this gap, we propose ContextGAN, a Context-Aware Differentially Private Generative Adversarial Network that integrates domain-specific rules through a constraint matrix encoding both explicit and implicit knowledge. The constraint-aware discriminator evaluates synthetic data against these rules to ensure adherence to domain constraints, while differential privacy protects sensitive details from the original data. We validate ContextGAN across healthcare, security, and finance, showing that it produces high-quality synthetic data that respects domain rules and preserves privacy. Our results demonstrate that ContextGAN improves realism and utility by enforcing domain constraints, making it suitable for applications that require compliance with both explicit patterns and implicit rules under strict privacy guarantees.

Paper Structure

This paper contains 12 sections, 6 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of NIDS accuracy for Lab Collected Data
  • Figure 2: Comparison of Re-identification Attack with 30%, 60% and 90% overlap on original data
  • Figure 3: Comparison of accuracy in Attribute Inference Attack
  • Figure 4: Comparison of Membership Inference Attack in White Box (WB) and FBB (Fully Black Box) setting