Table of Contents
Fetching ...

Wearable Accelerometer Foundation Models for Health via Knowledge Distillation

Salar Abbaspourazad, Anshuman Mishra, Joseph Futoma, Andrew C. Miller, Ian Shapiro

TL;DR

This work introduces accelerometry foundation models for health by distilling representations from a high-fidelity PPG teacher to a low-fidelity accelerometer encoder using a fully unsupervised, two-stage framework trained on the Apple Heart and Movement Study data. The PPG teacher is pre-trained with masked autoencoding (and occasionally contrastive learning), and its embeddings are transferred to accelerometry via cross-modal contrastive learning on paired signals, with augmentations proving crucial. The distilled accelerometry encoders exhibit strong cross-modal alignment (approximately $99.2\%$ top-1 retrieval) and deliver superior performance across heart rate, heart rate variability, demographics, and 46 health targets, while also enabling model compression to smaller architectures. This generalist foundation-model behavior suggests accelerometry-based digital biomarkers can be broadly deployed across wearables, expanding accessible health monitoring while highlighting considerations for privacy, equity, and interpretability.

Abstract

Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly smaller power footprint and is available in almost any wearable device. While accelerometry is widely used for activity recognition and fitness, it is less explored for health biomarkers and diagnosis. Here, we show that an accelerometry foundation model can predict a wide variety of health targets. To achieve improved performance, we distill representational knowledge from PPG encoders to accelerometery encoders using 20 million minutes of unlabeled data, collected from ~172K participants in the Apple Heart and Movement Study under informed consent. We observe strong cross-modal alignment on unseen data, e.g., 99.2% top-1 accuracy for retrieving PPG embeddings from accelerometry embeddings. We show that distilled accelerometry encoders have significantly more informative representations compared to self-supervised or supervised encoders trained directly on accelerometry data, observed by at least 23%-49% improved performance for predicting heart rate and heart rate variability. We also show that distilled accelerometry encoders are readily predictive of a wide array of downstream health targets, i.e., they are generalist foundation models. We believe accelerometry foundation models for health may unlock new opportunities for developing digital biomarkers from any wearable device.

Wearable Accelerometer Foundation Models for Health via Knowledge Distillation

TL;DR

This work introduces accelerometry foundation models for health by distilling representations from a high-fidelity PPG teacher to a low-fidelity accelerometer encoder using a fully unsupervised, two-stage framework trained on the Apple Heart and Movement Study data. The PPG teacher is pre-trained with masked autoencoding (and occasionally contrastive learning), and its embeddings are transferred to accelerometry via cross-modal contrastive learning on paired signals, with augmentations proving crucial. The distilled accelerometry encoders exhibit strong cross-modal alignment (approximately top-1 retrieval) and deliver superior performance across heart rate, heart rate variability, demographics, and 46 health targets, while also enabling model compression to smaller architectures. This generalist foundation-model behavior suggests accelerometry-based digital biomarkers can be broadly deployed across wearables, expanding accessible health monitoring while highlighting considerations for privacy, equity, and interpretability.

Abstract

Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly smaller power footprint and is available in almost any wearable device. While accelerometry is widely used for activity recognition and fitness, it is less explored for health biomarkers and diagnosis. Here, we show that an accelerometry foundation model can predict a wide variety of health targets. To achieve improved performance, we distill representational knowledge from PPG encoders to accelerometery encoders using 20 million minutes of unlabeled data, collected from ~172K participants in the Apple Heart and Movement Study under informed consent. We observe strong cross-modal alignment on unseen data, e.g., 99.2% top-1 accuracy for retrieving PPG embeddings from accelerometry embeddings. We show that distilled accelerometry encoders have significantly more informative representations compared to self-supervised or supervised encoders trained directly on accelerometry data, observed by at least 23%-49% improved performance for predicting heart rate and heart rate variability. We also show that distilled accelerometry encoders are readily predictive of a wide array of downstream health targets, i.e., they are generalist foundation models. We believe accelerometry foundation models for health may unlock new opportunities for developing digital biomarkers from any wearable device.

Paper Structure

This paper contains 20 sections, 1 equation, 4 figures, 23 tables.

Figures (4)

  • Figure 1: Overview of our dataset and methods. We use the multi-modal PPG-accelerometry data collected under informed consent from Apple Watch in Apple Heart and Movement Study. We first pre-train the PPG teacher encoder with masked autoencoding, and then distill its embeddings to an accelerometry encoder via cross-modal knowledge distillation. See Sections \ref{['section: teacher_pre_training']}, \ref{['subsection:methods_kd']}, \ref{['section: datasets']} and \ref{['subsubsection: evaluation_metrics']} for more details.
  • Figure 2: Cross-modal representational knowledge distillation improves the quality of accelerometry embeddings. We compare the representational quality of accelerometry encoders via their downstream prediction of heart rate, SDNN and RMSSD. We sweep the number of training segments/labels for supervised training and linear probing training, from 0.1% to 100% in the x axis. Distilled accelerometry encoders, "Accel-KD (via PPG MAE)" and "Accel-KD (via PPG-CL)", are better than their baseline uni-modal accelerometry encoders, "Accel-MAE" and "Accel-CL", and better than a supervised encoder ("Accel-supervised"). The bound of improvement with the dotted arrows is determined as the difference between the best uni-modal vs. the best distilled accelerometry encoder (Appendix Tables \ref{['table: linear_prob_numbers_hr']}, \ref{['table: linear_prob_numbers_sdnn']} and \ref{['table: linear_prob_numbers_rmssd']}). Compared to the supervised encoder, all pre-trained accelerometry encoders, including distilled ones, demonstrated robustness to the number of available training labels.
  • Figure 3: 2D T-SNE projections of embeddings for PPG pre-trained teacher encoder ("PPG-MAE") and 2 accelerometry encoders: 1) uni-modal encoder pre-trained with masked autoencoding ("Accel-MAE", left), 3) distilled encoder from the PPG teacher encoder ("Accel-MAE via PPG-KD", right). We can visually see marked alignment after distillation in the right panel. Each marker represents an individual segment, where markers are colored based on participants and segments are identical across panels. See retrieval analysis numbers in Table \ref{['table: results_retrieval_source_accel']}.
  • Figure 4: Cross-modal representational knowledge distillation can be used for model compression. We show the downstream prediction of heart rate, SDNN and RMSSD while compressing the distilled accelerometry encoder. We observe that even small accelerometry encoders maintain information and are still even better than the baseline accelerometry encoders, while being $\sim5\times$ smaller.