Table of Contents
Fetching ...

Large-scale Training of Foundation Models for Wearable Biosignals

Salar Abbaspourazad, Oussama Elachqar, Andrew C. Miller, Saba Emrani, Udhyakumar Nallasamy, Ian Shapiro

TL;DR

This work tackles the scarcity of labeled wearable biosignal data by training foundation models on large-scale, unlabeled PPG and ECG data from the Apple Heart and Movement Study. It introduces a self-supervised framework that uses participant-level positive pairs, a diverse augmentation cascade, regularized InfoNCE loss with KoLeo regularization, and momentum training to learn robust embeddings. The resulting PPG/ECG representations encode meaningful information about demographics and health conditions, outperforming baseline demographic features in several tasks and revealing modality-specific differences. The study demonstrates the potential of wearable-derived, self-supervised pre-training to reduce labeling requirements and enhance health monitoring in everyday life.

Abstract

Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions. Today, wearable devices can conveniently record various biosignals, creating the opportunity to monitor health status without disruption to one's daily routine. Despite widespread use of wearable devices and existing digital biomarkers, the absence of curated data with annotated medical labels hinders the development of new biomarkers to measure common health conditions. In fact, medical datasets are usually small in comparison to other domains, which is an obstacle for developing neural network models for biosignals. To address this challenge, we have employed self-supervised learning using the unlabeled sensor data collected under informed consent from the large longitudinal Apple Heart and Movement Study (AHMS) to train foundation models for two common biosignals: photoplethysmography (PPG) and electrocardiogram (ECG) recorded on Apple Watch. We curated PPG and ECG datasets from AHMS that include data from ~141K participants spanning ~3 years. Our self-supervised learning framework includes participant level positive pair selection, stochastic augmentation module and a regularized contrastive loss optimized with momentum training, and generalizes well to both PPG and ECG modalities. We show that the pre-trained foundation models readily encode information regarding participants' demographics and health conditions. To the best of our knowledge, this is the first study that builds foundation models using large-scale PPG and ECG data collected via wearable consumer devices $\unicode{x2013}$ prior works have commonly used smaller-size datasets collected in clinical and experimental settings. We believe PPG and ECG foundation models can enhance future wearable devices by reducing the reliance on labeled data and hold the potential to help the users improve their health.

Large-scale Training of Foundation Models for Wearable Biosignals

TL;DR

This work tackles the scarcity of labeled wearable biosignal data by training foundation models on large-scale, unlabeled PPG and ECG data from the Apple Heart and Movement Study. It introduces a self-supervised framework that uses participant-level positive pairs, a diverse augmentation cascade, regularized InfoNCE loss with KoLeo regularization, and momentum training to learn robust embeddings. The resulting PPG/ECG representations encode meaningful information about demographics and health conditions, outperforming baseline demographic features in several tasks and revealing modality-specific differences. The study demonstrates the potential of wearable-derived, self-supervised pre-training to reduce labeling requirements and enhance health monitoring in everyday life.

Abstract

Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions. Today, wearable devices can conveniently record various biosignals, creating the opportunity to monitor health status without disruption to one's daily routine. Despite widespread use of wearable devices and existing digital biomarkers, the absence of curated data with annotated medical labels hinders the development of new biomarkers to measure common health conditions. In fact, medical datasets are usually small in comparison to other domains, which is an obstacle for developing neural network models for biosignals. To address this challenge, we have employed self-supervised learning using the unlabeled sensor data collected under informed consent from the large longitudinal Apple Heart and Movement Study (AHMS) to train foundation models for two common biosignals: photoplethysmography (PPG) and electrocardiogram (ECG) recorded on Apple Watch. We curated PPG and ECG datasets from AHMS that include data from ~141K participants spanning ~3 years. Our self-supervised learning framework includes participant level positive pair selection, stochastic augmentation module and a regularized contrastive loss optimized with momentum training, and generalizes well to both PPG and ECG modalities. We show that the pre-trained foundation models readily encode information regarding participants' demographics and health conditions. To the best of our knowledge, this is the first study that builds foundation models using large-scale PPG and ECG data collected via wearable consumer devices prior works have commonly used smaller-size datasets collected in clinical and experimental settings. We believe PPG and ECG foundation models can enhance future wearable devices by reducing the reliance on labeled data and hold the potential to help the users improve their health.
Paper Structure (24 sections, 3 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: PPG and ECG foundation models encode participants' health information. The comparison of linear probing evaluation for targets from AHMS survey questions (a) using PPG embeddings, (b) and using ECG embeddings, versus baseline features is shown. Each marker represents one of the targets from AHMS questionnaire, the y-axis represents the ROC AUC of binary classification using the embeddings, and the x-axis represents that for the baseline features. The marker color and shapes are selected randomly, and are described in the legend.
  • Figure 2: Distinctions in PPG and ECG pre-training and embeddings. a. T-SNE representations for 20 random embeddings drawn from 10 random participants for PPG and ECG. b. InfoNCE validation loss for PPG vs. ECG, where training iteration represents a global gradient descent update across all GPUs. c. Dispersion ratio probability density function (PDF) calculated for each feature of the 256-dimensional PPG and ECG embeddings across the population. Dispersion ratio quantifies within participant variability to across participant variability.
  • Figure 3: High-level visualization of our pre-training framework shown for a mini-batch containing 2 participants and 4 segments. Augmented views of recorded biosignals are passed through an encoder, followed by an MLP projection head to get the representations. The representations are used to calculate the contrastive loss which attracts the positive pairs while repelling the negative pairs. Positive pairs are formed as different segments of the same participant. The momentum updates and KoLeo regularization are not shown on the figure.
  • Figure 4: Our EfficientNet-style encoder architecture, adapted from tan_efficientnet_2020 for time-series input. a. Encoder architecture with convolutional blocks shown as Conv1D, batch normalization as BatchNorm, Swish activation as Swish, mobile inverted bottleneck block as MBConv1D, average pooling as 1DAvgPool. b. The internal architecture of MBConv1D, where Sigmoid activation is shown as Sigmoid and asterisk represents element wise multiplication.
  • Figure 5: The distribution of demographic variables in AHMS shown for self-reported a. age, b. BMI, c. biological sex and ethnicity.
  • ...and 2 more figures