From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification
Dominique Mercier, Andreas Dengel, Sheraz Ahmed
TL;DR
The paper tackles the challenge of using private time-series data for learning by benchmarking GAN-based privacy-preserving data generation methods. It compares DPWGAN and GSWGAN, showing that gradient-sanitized GSWGAN generally delivers more stable and higher-quality public data for downstream classification than differential-privacy constrained classifiers or DPWGAN, especially when using a convolutional generator and Fréchet Inception Distance-based early stopping. Across diverse time-series datasets, GSWGAN achieves distributions close to the private data without memorization, enabling effective classifier training on synthetic public data. The work highlights practical implications for privacy-sensitive domains like healthcare and finance, where sharing synthetic time-series data can unlock model reuse and benchmarking without compromising individual data privacy. It also provides concrete guidance on architecture choices, stopping criteria, and privacy budgeting to optimize synthetic data quality.
Abstract
Deep learning has proven to be successful in various domains and for different tasks. However, when it comes to private data several restrictions are making it difficult to use deep learning approaches in these application fields. Recent approaches try to generate data privately instead of applying a privacy-preserving mechanism directly, on top of the classifier. The solution is to create public data from private data in a manner that preserves the privacy of the data. In this work, two very prominent GAN-based architectures were evaluated in the context of private time series classification. In contrast to previous work, mostly limited to the image domain, the scope of this benchmark was the time series domain. The experiments show that especially GSWGAN performs well across a variety of public datasets outperforming the competitor DPWGAN. An analysis of the generated datasets further validates the superiority of GSWGAN in the context of time series generation.
