Generative clinical time series models trained on moderate amounts of patient data are privacy preserving

Rustam Zhumagambetov; Niklas Giesa; Sebastian D. Boie; Stefan Haufe

Generative clinical time series models trained on moderate amounts of patient data are privacy preserving

Rustam Zhumagambetov, Niklas Giesa, Sebastian D. Boie, Stefan Haufe

TL;DR

This study addresses privacy risks in generative clinical time-series models trained on public ICU data. It evaluates four state-of-the-art generative models (GHOSTS, HALO, KoVAE, Diffusion-TS) against a battery of membership inference attacks using MIMIC-IV and eICU data, showing that with training sizes above ~500 samples the synthetic outputs are largely resistant to such attacks. The work also demonstrates that differential privacy, while theoretically protective, can reduce utility and does not reliably enhance privacy in this context, and that cross-dataset attacks can exploit shared physiological patterns to undermine privacy. The authors propose a framework for ex-post privacy auditing that quantifies privacy risk via multiple attack modalities and metrics, and they advocate integrating such audits into model validation and data governance workflows to enable safer data sharing for research.

Abstract

Sharing medical data for machine learning model training purposes is often impossible due to the risk of disclosing identifying information about individual patients. Synthetic data produced by generative artificial intelligence (genAI) models trained on real data is often seen as one possible solution to comply with privacy regulations. While powerful genAI models for heterogeneous hospital time series have recently been introduced, such modeling does not guarantee privacy protection, as the generated data may still reveal identifying information about individuals in the models' training cohort. Applying established privacy mechanisms to generative time series models, however, proves challenging as post-hoc data anonymization through k-anonymization or similar techniques is limited, while model-centered privacy mechanisms that implement differential privacy (DP) may lead to unstable training, compromising the utility of generated data. Given these known limitations, privacy audits for generative time series models are currently indispensable regardless of the concrete privacy mechanisms applied to models and/or data. In this work, we use a battery of established privacy attacks to audit state-of-the-art hospital time series models, trained on the public MIMIC-IV dataset, with respect to privacy preservation. Furthermore, the eICU dataset was used to mount a privacy attack against the synthetic data generator trained on the MIMIC-IV dataset. Results show that established privacy attacks are ineffective against generated multivariate clinical time series when synthetic data generators are trained on large enough training datasets. Furthermore, we discuss how the use of existing DP mechanisms for these synthetic data generators would not bring desired improvement in privacy, but only a decrease in utility for machine learning prediction tasks.

Generative clinical time series models trained on moderate amounts of patient data are privacy preserving

TL;DR

Abstract

Paper Structure (21 sections, 18 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 18 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Methods
Real data
MIMIC-IV
eICU
Generative models for multivariate time series data
Membership inference attacks
Density-based MIA
Model-based MIA
Privacy assessment
Dimensionality reduction
MIA performance metrics
Code availability
Results
Discussion
...and 6 more sections

Figures (6)

Figure 1: Performance of membership inference attacks against synthetic hospital time series generated by Diffusion-TS, GHOSTS without postprocessing (GHOSTS), GHOSTS with postprocessing (GHOSTS-POST), HALO, and KoVAE. Training set size refers to the training size of the synthetic data generator. Axes labels $\mathcal{A}_{MC-\theta}$, $\mathcal{A}_{\mathrm{DOMIAS-eq1}}$, $\mathcal{A}_{\mathrm{GAN-Leak-Breugel}}$, $\mathcal{A}_{\mathrm{GAN-Leak-DTW}}$, $\mathcal{A}_{\mathrm{GAN-Leak-\|\cdot\|}}$, $\mathcal{A}_{\mathrm{GAN-Leak-cal}}$, $\mathcal{A}_{\mathrm{DOMIAS-KDE}}$, $\mathcal{A}_{\mathrm{DOMIAS-BNAF}}$, $\mathcal{A}_{\mathrm{Logan-pb}}$ denote different privacy attacks. The left column refers to attacks that only use synthetic data, while the right column refers to attacks that use auxiliary (non-training real) data. Color-coded heat maps depict the mean of the area under the ROC curve (AUROC) estimated using $K=100$ bootstrap samples. The chance level for all attacks is AUROC = 0.5 .
Figure 2: Performance of membership inference attacks against synthetic hospital attributes generated by GHOSTS without postprocessing (GHOSTS) and HALO. Training set size refers to the training size of the synthetic data generators. Axes labels $\mathcal{A}_{MC-\theta}$, $\mathcal{A}_{\mathrm{DOMIAS-eq1}}$, $\mathcal{A}_{\mathrm{GAN-Leak-Breugel}}$, $\mathcal{A}_{\mathrm{GAN-Leak-\|\cdot\|}}$, $\mathcal{A}_{\mathrm{GAN-Leak-cal}}$, $\mathcal{A}_{\mathrm{DOMIAS-KDE}}$, $\mathcal{A}_{\mathrm{DOMIAS-BNAF}}$, $\mathcal{A}_{\mathrm{Logan-pb}}$ denote different privacy attacks. The left column refers to attacks that only use synthetic data, while the right column refers to attacks that also use auxiliary data. Color-coded heat maps depict the mean of the area under the ROC curve (AUROC) estimated using $K=100$ bootstrap samples. The chance level for all attacks is AUROC = 0.5.
Figure 3: Estimation of overfitting using normalized root mean square metric across training sizes. Training set size refers to the training dataset size of the synthetic data generators. The values are $NRMSE_{\operatorname{min}}(\mathcal{D}_\text{syn})$, where $\mathcal{D}_\text{syn}$ is one of the datasets generated by synthetic data generators, described in \ref{['eq:nrmse_min']}, estimated using bootstrapping with K=100.
Figure C.1: Performance of membership inference attacks, when eICU data was used to mount an attack, against synthetic hospital time series generated by Diffusion-TS, GHOSTS without postprocessing (GHOSTS), GHOSTS with postprocessing (GHOSTS-POST), HALO, and KoVAE. Training set size refers to the training size of the synthetic data generator. Axes labels $\mathcal{A}_{\mathrm{GAN-Leak-cal}}$, $\mathcal{A}_{\mathrm{DOMIAS-KDE}}$, $\mathcal{A}_{\mathrm{DOMIAS-BNAF}}$, $\mathcal{A}_{\mathrm{Logan-pb}}$ denote different privacy attacks. The left column refers to attacks that use part of MIMIC-IV as auxiliary information. The right column refers to attacks that use eICU as auxiliary information. Color-coded heat maps depict the mean of the area under the ROC curve (AUROC) estimated using $K=100$ bootstrap samples. The chance level for all attacks is AUROC = 0.5.
Figure D.2: Performance of privacy attacks against time series ground truth data. Training set size refers to the size of real data used to train the attack. $\mathcal{A}_{MC-\theta}$, $\mathcal{A}_{\mathrm{DOMIAS-eq1}}$, $\mathcal{A}_{\mathrm{GAN-Leak-Breugel}}$, $\mathcal{A}_{\mathrm{GAN-Leak-\|\cdot\|}}$, $\mathcal{A}_{\mathrm{GAN-Leak-cal}}$, $\mathcal{A}_{\mathrm{DOMIAS-KDE}}$, $\mathcal{A}_{\mathrm{DOMIAS-BNAF}}$, $\mathcal{A}_{\mathrm{Logan-pb}}$ are the privacy attacks. The upper figure shows the scenario where 100% of the test labeled as training size members was used for training of the attacks. The lower figure shows the scenario where only 80% of the test labeled as training size member was used for training of the attacks. The values are the mean of the area under the ROC curve (AUROC) estimated using bootstrapping with K=100. The chance level for all attacks is AUROC = 0.5.
...and 1 more figures

Generative clinical time series models trained on moderate amounts of patient data are privacy preserving

TL;DR

Abstract

Generative clinical time series models trained on moderate amounts of patient data are privacy preserving

Authors

TL;DR

Abstract

Table of Contents

Figures (6)