ST-DPGAN: A Privacy-preserving Framework for Spatiotemporal Data Generation
Wei Shao, Rongyi Zhu, Cai Yang, Chandra Thapa, Muhammad Ejaz Ahmed, Seyit Camtepe, Rui Zhang, DuYong Kim, Hamid Menouar, Flora D. Salim
TL;DR
ST-DPGAN introduces a privacy-preserving framework for spatiotemporal data generation by integrating differential privacy with a Graph-GAN. The generator uses a transConv1d module to map 1-D Gaussian noise to a $T \times N$ spatiotemporal representation, while the discriminator employs spatial and temporal attention over a graph-embedded input, with DP-SGD enforcing privacy guarantees. Across three real-world datasets, ST-DPGAN and its Attn variant achieve superior data quality (lower MSE/MAE) than baselines like DPGAN and WGAN under varying privacy budgets, with ablation studies confirming the critical roles of transConv1d and graph embedding. The work demonstrates that privacy-protected synthetic spatiotemporal data can retain substantial utility for downstream predictive tasks, enabling safer data sharing and analysis in sensitive domains.
Abstract
Spatiotemporal data is prevalent in a wide range of edge devices, such as those used in personal communication and financial transactions. Recent advancements have sparked a growing interest in integrating spatiotemporal analysis with large-scale language models. However, spatiotemporal data often contains sensitive information, making it unsuitable for open third-party access. To address this challenge, we propose a Graph-GAN-based model for generating privacy-protected spatiotemporal data. Our approach incorporates spatial and temporal attention blocks in the discriminator and a spatiotemporal deconvolution structure in the generator. These enhancements enable efficient training under Gaussian noise to achieve differential privacy. Extensive experiments conducted on three real-world spatiotemporal datasets validate the efficacy of our model. Our method provides a privacy guarantee while maintaining the data utility. The prediction model trained on our generated data maintains a competitive performance compared to the model trained on the original data.
