From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

Dominique Mercier; Andreas Dengel; Sheraz Ahmed

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

Dominique Mercier, Andreas Dengel, Sheraz Ahmed

TL;DR

The paper tackles the challenge of using private time-series data for learning by benchmarking GAN-based privacy-preserving data generation methods. It compares DPWGAN and GSWGAN, showing that gradient-sanitized GSWGAN generally delivers more stable and higher-quality public data for downstream classification than differential-privacy constrained classifiers or DPWGAN, especially when using a convolutional generator and Fréchet Inception Distance-based early stopping. Across diverse time-series datasets, GSWGAN achieves distributions close to the private data without memorization, enabling effective classifier training on synthetic public data. The work highlights practical implications for privacy-sensitive domains like healthcare and finance, where sharing synthetic time-series data can unlock model reuse and benchmarking without compromising individual data privacy. It also provides concrete guidance on architecture choices, stopping criteria, and privacy budgeting to optimize synthetic data quality.

Abstract

Deep learning has proven to be successful in various domains and for different tasks. However, when it comes to private data several restrictions are making it difficult to use deep learning approaches in these application fields. Recent approaches try to generate data privately instead of applying a privacy-preserving mechanism directly, on top of the classifier. The solution is to create public data from private data in a manner that preserves the privacy of the data. In this work, two very prominent GAN-based architectures were evaluated in the context of private time series classification. In contrast to previous work, mostly limited to the image domain, the scope of this benchmark was the time series domain. The experiments show that especially GSWGAN performs well across a variety of public datasets outperforming the competitor DPWGAN. An analysis of the generated datasets further validates the superiority of GSWGAN in the context of time series generation.

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

TL;DR

Abstract

Paper Structure (14 sections, 6 figures, 6 tables)

This paper contains 14 sections, 6 figures, 6 tables.

Introduction
Related Work
Evaluated Method
Datasets
Experiments & Results
Experiment 1: Accuracy Comparison of DP, DPWGAN, and GSWGAN
Experiment 2: Finding the Best Stopping Criteria for GSWGAN
Experiment 3: Impact of Architecture on GSWGAN
Experiment 4: Impact of Noise Multiplier on Privacy-preserving Approaches
Experiment 5: T-SNE Visualization of Generated Data
Experiment 6: Dataset Visualization – Private vs. Generated (Public) Data
Experiment 7: Dataset Analysis – Computing the Distance between Samples
Discussion
Conclusion

Figures (6)

Figure 1: T-SNE Visualization (1/2): Subfigures show datasets generated using DPWGAN xie2018differentially and GSWGAN chen2020gs. Plots within each subfigure: Left shows the difference between private and public train datasets. Middle: Class distribution of private train dataset. Right: Class distribution public train dataset.
Figure 2: T-SNE Visualization (2/2): Subfigures show the datasets generated using GSWGAN chen2020gs. Plots within each subfigure: Left shows the difference between private and public train datasets. Middle: Class distribution of private train dataset. Right: Class distribution public train dataset.
Figure 3: Dataset Visualization: Shows the private and the generated (public) data generated using GSWGAN-dense and GSWGAN-conv. The generated data of GSWGAN-dense shows a noisy behavior compared to the very realistic samples GSWGAN-conv chen2020gs.
Figure 4: Dataset Visualization: Shows datasets created using GSWGAN-conv chen2020gs. Grey lines correspond to multiple original data samples. Blue corresponds to the generated data sample.
Figure 5: Dataset Distances (1/2): Shows the distance between the private and generated datasets using GSWGAN-dense and GSWGAN-conv and the samples within each dataset. To compare two datasets, the union of the samples was built and the L2-norm is used to compute the distance.
...and 1 more figures

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

TL;DR

Abstract

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)