Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting
Marius Fracarolli, Michael Staniek, Stefan Riezler
TL;DR
This work tackles privacy concerns in clinical time-series forecasting by exploiting embedding-space data augmentation to mitigate Membership Inference Attacks (MIA) while preserving predictive accuracy. It contrasts zeroth-order optimization (ZOO) in embedding space, its PCA-restricted variant (ZOO-PCA), and MixUp, showing that ZOO-PCA yields the best privacy-utility tradeoff and that MixUp enhances generalization. The study demonstrates that augmenting training data with synthetic embeddings can significantly reduce the attacker’s advantage, as measured by the TPR/FPR ratio, without compromising test performance; DP-SGD can provide strong privacy but at a substantial utility cost. The findings suggest embedding-space augmentation as a practical defense for privacy-preserving TSF on public EHR datasets, with potential for hybrid approaches and broader applicability to deep architectures and privacy attacks.
Abstract
Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting (TSF) tasks involving Electronic Health Records (EHR). In this study, we explore how data augmentation can mitigate Membership Inference Attacks (MIA) on TSF models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based MIAs by reducing the attacker's true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model's ability to generalize to unseen data. We examine multiple augmentation strategies - Zeroth-Order Optimization (ZOO), a variant of ZOO constrained by Principal Component Analysis (ZOO-PCA), and MixUp - to strengthen model resilience without sacrificing accuracy. Our experimental results show that ZOO-PCA yields the best reductions in TPR/FPR ratio for MIA attacks without sacrificing performance on test data.
