Table of Contents
Fetching ...

SleepPPG-Net2: Deep learning generalization for sleep staging from photoplethysmography

Shirel Attia, Revital Shani Hershkovich, Alissa Tabakhov, Angeleene Ang, Sharon Haimov, Riva Tauman, Joachim A. Behar

TL;DR

SleepPPG-Net2 tackles the generalization gap in PPG-based sleep staging by leveraging multisource-domain training with a domain-shifts-uncertainty mechanism. It outperforms prior methods across six diverse datasets (2,574 recordings), achieving up to a 19% gain in per-patient Cohen's kappa and better sleep-measure estimates. The approach demonstrates practical potential for wearable sleep monitoring while identifying demographic and clinical factors that influence performance. By integrating raw PPG signals with robust cross-domain learning, the work broadens the applicability of PPG-based sleep staging in real-world settings.

Abstract

Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series.

SleepPPG-Net2: Deep learning generalization for sleep staging from photoplethysmography

TL;DR

SleepPPG-Net2 tackles the generalization gap in PPG-based sleep staging by leveraging multisource-domain training with a domain-shifts-uncertainty mechanism. It outperforms prior methods across six diverse datasets (2,574 recordings), achieving up to a 19% gain in per-patient Cohen's kappa and better sleep-measure estimates. The approach demonstrates practical potential for wearable sleep monitoring while identifying demographic and clinical factors that influence performance. By integrating raw PPG signals with robust cross-domain learning, the work broadens the applicability of PPG-based sleep staging in real-world settings.

Abstract

Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series.
Paper Structure (25 sections, 3 equations, 6 figures, 3 tables)

This paper contains 25 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Research overview. Panel (a) presents a classic strategy that involves the training of a sleep staging classification model using a single dataset (source domain), which typically exhibits decreased performance on external datasets (target domains). Panel (b) introduces our multi-source training strategy, which utilizes multiple datasets during training to develop a more generalizable representation, thereby enhancing performance when the model is evaluated on an external dataset (target domain).
  • Figure 2: Data distribution presented in violin plots for (a) age, (b) apnea hypopnea index (AHI) and (c) body mass index (BMI) and bar plot for (d) ethnicity. The AHI, BMI and ethnicity variables were not available for the CAP dataset and ethnicity was not available for the SLEEPAI dataset.
  • Figure 3: Median Kappa performance for (a) four-class (Wake, Deep, Light, REM), (b) three-class (Wake, NREM, REM) and (c) two-class (Wake, Sleep) sleep stage classification. The median performance is plotted for each test set. The confidence interval for Kappa is provided as interquartiles (Q1-Q3).
  • Figure 4: Confusion matrix for SleepPPG-Net2 (four-class). Source domain test set (a) and target domains (b-f).
  • Figure 5: Scatter and Bland-Altman plots of the sleep measures. (a–e): Scatter plots of the ground truth and predicated sleep measures for all external datasets combined. The black line represents the equation y = x. (f–j): Bland-Altman plots comparing ground truth vs. estimated sleep measures for all external datasets combined. The error lines in red are positioned at ± 1.96 the standard deviation. From left to right the sleep measures are: (a, f)- TST in minutes, (b, g)- SE in percentage, (c, h)- FRLight in percentage, (d, i)- FRDeep in percentage, (e, j)- FRREM in percentage.
  • ...and 1 more figures