Table of Contents
Fetching ...

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

Dominique Mercier, Andreas Dengel, Sheraz Ahmed

TL;DR

The paper tackles the challenge of using private time-series data for learning by benchmarking GAN-based privacy-preserving data generation methods. It compares DPWGAN and GSWGAN, showing that gradient-sanitized GSWGAN generally delivers more stable and higher-quality public data for downstream classification than differential-privacy constrained classifiers or DPWGAN, especially when using a convolutional generator and Fréchet Inception Distance-based early stopping. Across diverse time-series datasets, GSWGAN achieves distributions close to the private data without memorization, enabling effective classifier training on synthetic public data. The work highlights practical implications for privacy-sensitive domains like healthcare and finance, where sharing synthetic time-series data can unlock model reuse and benchmarking without compromising individual data privacy. It also provides concrete guidance on architecture choices, stopping criteria, and privacy budgeting to optimize synthetic data quality.

Abstract

Deep learning has proven to be successful in various domains and for different tasks. However, when it comes to private data several restrictions are making it difficult to use deep learning approaches in these application fields. Recent approaches try to generate data privately instead of applying a privacy-preserving mechanism directly, on top of the classifier. The solution is to create public data from private data in a manner that preserves the privacy of the data. In this work, two very prominent GAN-based architectures were evaluated in the context of private time series classification. In contrast to previous work, mostly limited to the image domain, the scope of this benchmark was the time series domain. The experiments show that especially GSWGAN performs well across a variety of public datasets outperforming the competitor DPWGAN. An analysis of the generated datasets further validates the superiority of GSWGAN in the context of time series generation.

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

TL;DR

The paper tackles the challenge of using private time-series data for learning by benchmarking GAN-based privacy-preserving data generation methods. It compares DPWGAN and GSWGAN, showing that gradient-sanitized GSWGAN generally delivers more stable and higher-quality public data for downstream classification than differential-privacy constrained classifiers or DPWGAN, especially when using a convolutional generator and Fréchet Inception Distance-based early stopping. Across diverse time-series datasets, GSWGAN achieves distributions close to the private data without memorization, enabling effective classifier training on synthetic public data. The work highlights practical implications for privacy-sensitive domains like healthcare and finance, where sharing synthetic time-series data can unlock model reuse and benchmarking without compromising individual data privacy. It also provides concrete guidance on architecture choices, stopping criteria, and privacy budgeting to optimize synthetic data quality.

Abstract

Deep learning has proven to be successful in various domains and for different tasks. However, when it comes to private data several restrictions are making it difficult to use deep learning approaches in these application fields. Recent approaches try to generate data privately instead of applying a privacy-preserving mechanism directly, on top of the classifier. The solution is to create public data from private data in a manner that preserves the privacy of the data. In this work, two very prominent GAN-based architectures were evaluated in the context of private time series classification. In contrast to previous work, mostly limited to the image domain, the scope of this benchmark was the time series domain. The experiments show that especially GSWGAN performs well across a variety of public datasets outperforming the competitor DPWGAN. An analysis of the generated datasets further validates the superiority of GSWGAN in the context of time series generation.
Paper Structure (14 sections, 6 figures, 6 tables)

This paper contains 14 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: T-SNE Visualization (1/2): Subfigures show datasets generated using DPWGAN xie2018differentially and GSWGAN chen2020gs. Plots within each subfigure: Left shows the difference between private and public train datasets. Middle: Class distribution of private train dataset. Right: Class distribution public train dataset.
  • Figure 2: T-SNE Visualization (2/2): Subfigures show the datasets generated using GSWGAN chen2020gs. Plots within each subfigure: Left shows the difference between private and public train datasets. Middle: Class distribution of private train dataset. Right: Class distribution public train dataset.
  • Figure 3: Dataset Visualization: Shows the private and the generated (public) data generated using GSWGAN-dense and GSWGAN-conv. The generated data of GSWGAN-dense shows a noisy behavior compared to the very realistic samples GSWGAN-conv chen2020gs.
  • Figure 4: Dataset Visualization: Shows datasets created using GSWGAN-conv chen2020gs. Grey lines correspond to multiple original data samples. Blue corresponds to the generated data sample.
  • Figure 5: Dataset Distances (1/2): Shows the distance between the private and generated datasets using GSWGAN-dense and GSWGAN-conv and the samples within each dataset. To compare two datasets, the union of the samples was built and the L2-norm is used to compute the distance.
  • ...and 1 more figures