Table of Contents
Fetching ...

PaPaGei: Open Foundation Models for Optical Physiological Signals

Arvind Pillai, Dimitris Spathis, Fahim Kawsar, Mohammad Malekzadeh

TL;DR

This paper introduces PaPaGei, the first open foundation model for photoplethysmography (PPG) signals, trained on 57k hours from public datasets. It presents two SSL objectives: PaPaGei-P (patient-aware) and PaPaGei-S (morphology-aware) with a morphology augmentation module leveraging sVRI, IPA, and SQI, plus mixture-of-experts heads. Evaluated across 20 tasks from 10 diverse datasets, PaPaGei yields consistent improvements in classification and regression and demonstrates data efficiency compared to larger baselines. The work emphasizes open reproducibility, skin-tone robustness analyses, and the potential of PaPaGei as a backbone for multimodal health monitoring, while acknowledging limitations and outlining future directions for broader data, domain adaptation, and responsible deployment.

Abstract

Photoplethysmography (PPG) is the leading non-invasive technique for monitoring biosignals and cardiovascular health, with widespread adoption in both clinical settings and consumer wearable devices. While machine learning models trained on PPG signals have shown promise, they tend to be task-specific and struggle with generalization. Current research is limited by the use of single-device datasets, insufficient exploration of out-of-domain generalization, and a lack of publicly available models, which hampers reproducibility. To address these limitations, we present PaPaGei, the first open foundation model for PPG signals. The model is pre-trained on over 57,000 hours of data, comprising 20 million unlabeled PPG segments from publicly available datasets. We introduce a novel representation learning approach that leverages domain knowledge of PPG signal morphology across individuals, enabling the capture of richer representations compared to traditional contrastive learning methods. We evaluate PaPaGei against state-of-the-art time-series foundation models and self-supervised learning benchmarks across 20 tasks from 10 diverse datasets, spanning cardiovascular health, sleep disorders, pregnancy monitoring, and wellbeing assessment. Our model demonstrates superior performance, improving classification and regression metrics by 6.3% and 2.9% respectively in at least 14 tasks. Notably, PaPaGei achieves these results while being more data- and parameter-efficient, outperforming models that are 70x larger. Beyond accuracy, we examine model robustness across different skin tones, establishing a benchmark for bias evaluation in future models. PaPaGei can serve as both a feature extractor and an encoder for multimodal models, opening up new opportunities for multimodal health monitoring.

PaPaGei: Open Foundation Models for Optical Physiological Signals

TL;DR

This paper introduces PaPaGei, the first open foundation model for photoplethysmography (PPG) signals, trained on 57k hours from public datasets. It presents two SSL objectives: PaPaGei-P (patient-aware) and PaPaGei-S (morphology-aware) with a morphology augmentation module leveraging sVRI, IPA, and SQI, plus mixture-of-experts heads. Evaluated across 20 tasks from 10 diverse datasets, PaPaGei yields consistent improvements in classification and regression and demonstrates data efficiency compared to larger baselines. The work emphasizes open reproducibility, skin-tone robustness analyses, and the potential of PaPaGei as a backbone for multimodal health monitoring, while acknowledging limitations and outlining future directions for broader data, domain adaptation, and responsible deployment.

Abstract

Photoplethysmography (PPG) is the leading non-invasive technique for monitoring biosignals and cardiovascular health, with widespread adoption in both clinical settings and consumer wearable devices. While machine learning models trained on PPG signals have shown promise, they tend to be task-specific and struggle with generalization. Current research is limited by the use of single-device datasets, insufficient exploration of out-of-domain generalization, and a lack of publicly available models, which hampers reproducibility. To address these limitations, we present PaPaGei, the first open foundation model for PPG signals. The model is pre-trained on over 57,000 hours of data, comprising 20 million unlabeled PPG segments from publicly available datasets. We introduce a novel representation learning approach that leverages domain knowledge of PPG signal morphology across individuals, enabling the capture of richer representations compared to traditional contrastive learning methods. We evaluate PaPaGei against state-of-the-art time-series foundation models and self-supervised learning benchmarks across 20 tasks from 10 diverse datasets, spanning cardiovascular health, sleep disorders, pregnancy monitoring, and wellbeing assessment. Our model demonstrates superior performance, improving classification and regression metrics by 6.3% and 2.9% respectively in at least 14 tasks. Notably, PaPaGei achieves these results while being more data- and parameter-efficient, outperforming models that are 70x larger. Beyond accuracy, we examine model robustness across different skin tones, establishing a benchmark for bias evaluation in future models. PaPaGei can serve as both a feature extractor and an encoder for multimodal models, opening up new opportunities for multimodal health monitoring.

Paper Structure

This paper contains 29 sections, 2 equations, 26 figures, 17 tables.

Figures (26)

  • Figure 1: PaPaGei Overview. We curate public datasets of diverse PPG signals, and train a foundation model leveraging a novel morphology-aware contrastive learning approach. To evaluate its effectiveness, we apply the embeddings generated by PaPaGei to 20 tasks from 10 different datasets.
  • Figure 2: Overview of PaPaGei-S. The process begins by computing three morphology metrics (IPA, SVRI, and SQI) for each PPG segment. The raw PPG signals are then processed through an encoder ($E$) to generate embeddings ($H$). These same embeddings feed into three specialized heads: a projection head ($P$) that contrasts PPG signals based on sVRI values, and two mixture-of-expert heads ($M_1$ and $M_2$) that refine the embeddings by predicting IPA and SQI values.
  • Figure 3: Radar charts of downstream tasks. (Top) Classification performance in AUROC (larger area is better). (Bottom) Regression performance in MAE (smaller area is better). Pre-trained models in purple: REGLE, Chronos, & Moment. Statistical feature baseline in gray. SSL methods in green: SimCLR, BYOL, & TF-C. PaPaGei (ours), in pink. Details are in Tables \ref{['tab:binary_classification_fm']} & \ref{['tab:binary_classification_ssl']}.
  • Figure 4: Downstream data-efficiency analysis. Results are averaged over all binary classification (left) and regression tasks (right). PaPaGei-S performs better with increased label availability.
  • Figure 5: Ablation on pre-training data. Average performance across tasks for models trained on: V (VitalDB), M (MESA), and M-III (MIMIC-III). The mean value is displayed above the plots. The Wilcoxon signed rank test is applied to evaluate significance between the All dataset and the rest ($**:p < 0.05 \;\text{and}\; *:0.05\leq p < 0.10$).
  • ...and 21 more figures