Table of Contents
Fetching ...

Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

Lucas Lange, Nils Wenzlitschke, Erhard Rahm

TL;DR

Privacy concerns in smartwatch health data motivate the use of synthetic data to enable research without exposing individuals. The authors compare non-private GANs and private DP-GANs (CGAN, DoppelGANger, DP-CGAN) to generate multimodal time-series data (ACC, EDA, TEMP, BVP) and evaluate their impact on stress detection on the WESAD dataset using LOSO with TSTR and AUGM strategies. They demonstrate that differentially private synthetic data can improve utility-privacy trade-offs, with private DP training yielding substantial F1-score gains in the range of about $11.90$ to $15.48egin{matrix} ext{percent}\, ext{points} \end{matrix}$ while non-private training yields a modest gain around $0.45$ percentage points. The results show that CGAN generally provides higher-fidelity synthetic data, while DP-CGAN enables privacy-preserving augmentation that improves stress detection performance across subjects, suggesting practical applicability to health monitoring tasks beyond this study.

Abstract

Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90-15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.

Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

TL;DR

Privacy concerns in smartwatch health data motivate the use of synthetic data to enable research without exposing individuals. The authors compare non-private GANs and private DP-GANs (CGAN, DoppelGANger, DP-CGAN) to generate multimodal time-series data (ACC, EDA, TEMP, BVP) and evaluate their impact on stress detection on the WESAD dataset using LOSO with TSTR and AUGM strategies. They demonstrate that differentially private synthetic data can improve utility-privacy trade-offs, with private DP training yielding substantial F1-score gains in the range of about to while non-private training yields a modest gain around percentage points. The results show that CGAN generally provides higher-fidelity synthetic data, while DP-CGAN enables privacy-preserving augmentation that improves stress detection performance across subjects, suggesting practical applicability to health monitoring tasks beyond this study.

Abstract

Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90-15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.
Paper Structure (41 sections, 1 equation, 10 figures, 6 tables)

This paper contains 41 sections, 1 equation, 10 figures, 6 tables.

Figures (10)

  • Figure S1: A brief description of the basic GAN architecture: The generator, denoted as $G$, creates an artificial sample $x'$ using a random noise input $z$. These artificial samples $x'$ and the real samples $x$ are fed into the discriminator $D$, which categorizes each sample as either real or artificial. The classification results are used to compute the loss, which is then used to update both the generator and the discriminator through backpropagation.
  • Figure S2: Our experimental methods are illustrated by the given workflow. In the first step, we load and pre-process the WESAD dataset. We then train different GAN models for our data augmentation purposes. Each resulting model generates synthetic data, which are evaluated on data quality and, finally, compared on their ability to improve our stress detection models.
  • Figure S3: The individual signal modalities plotted for Subject ID4 after resampling, relabeling, and normalizing the data. The orange line shows the label, which equals 0 for non-stress and 1 for stress.
  • Figure S4: The spectrum plots from the FFT calculations of all subwindows in a 60-s window (a), and the plot of the averaged spectrum representation over these subwindows (b).
  • Figure S5: Visualization of synthetic data from our GANs using PCA and t-SNE to cluster data points against original WESAD data. Generated data are more realistic when they fit the original data points.
  • ...and 5 more figures