SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Rahul Thapa; Bryan He; Magnus Ruud Kjaer; Hyatt Moore; Gauri Ganjoo; Emmanuel Mignot; James Zou

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou

TL;DR

SleepFM tackles sleep analysis by learning a holistic multi-modal representation from brain, cardiac, and respiratory signals using contrastive learning on a large polysomnography dataset. It introduces a novel leave-one-out contrastive learning objective and demonstrates superior downstream performance on sleep staging and sleep-disordered breathing tasks, with robust cross-modality retrieval. External validation on Physionet CinC 2018 shows good generalization to unseen data despite modality differences. The work highlights the promise of multi-modal foundation models in sleep medicine and provides an open-source codebase for community adoption.

Abstract

Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase.

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 5 figures, 29 tables)

This paper contains 22 sections, 2 equations, 5 figures, 29 tables.

Introduction
Related Work
Machine Learning for Analyzing Sleep Data
Contrastive Learning
Method
Dataset and Preprocessing
Embedding Model
Multi-modal Contrastive Learning
Model Training
Experiments and Results
Demographic Attributes Classification
Retrieval Analysis
Downstream Classification Tasks
Few-Shot Evaluation
Benefit of Multi-Modal Pretraining
...and 7 more sections

Figures (5)

Figure 1: Overview of SleepFM pre-training with CL. We experiment with two types of pre-training: standard pairwise CL where we contrast embeddings from each pair of modalities separately, and our novel leave-one-out CL where we contrast the embedding of each modality against the average embedding of all other modalities. BAS (Brain Activity Signals) measures brain activity, eye and muscle movement, Electrocardiogram (ECG) measures heart activity, and Respiratory channels measure chest, abdomen movements, pusle, nasal, and oral flow.
Figure 2: Few Shot Evaluation. The x-axis represents number of patients that the model was trained on and y-axis represents evaluation metrics AUROC and AUPRC. In case of pairwise and leave-one-out, we select embeddings from $k$ number of patients to train a logistic regression model. The largest number of patients used (1265) is the total size of our training dataset. In case of supervised CNN, we train the model end-to-end on $k$ number of patients to classify either sleep stages or SDB. Testing is done on the entire test set. For each shot, we average the performance across 3 replicates.
Figure 3: Ablation few shot plot. The x-axis represents number of patients that the model was trained on and y-axis represents performance metrics AUROC and AUPRC. We select embedding from $k$ number of patients to train a logistic regression model. The last shot (1265) is the total size of our training dataset. The other models (Resp-ECG, Resp-BAS, BAS-ECG) represents the model pretrained using only 2 modalities. Finally, BAS and RESP represents models pretrained with only 1 modality. For each shot, we average the performance across 3 replicates.
Figure 4: 30-second clip of raw patient data. The x-axis is time and y-axis is different channels across all three modalities: BAS, ECG, and Respiratory.
Figure 5: Distribution of events across an entire patient sleep. The x-axis represents approximately 8 hours in seconds, and y-axis is distribution of different sleep events during the entire duration of sleep. N1, N2, N3 refers to Sleep Stage 1, 2, and 3 respectively. Obs Hypopnea and Obs SDB are types of SDBs.

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

TL;DR

Abstract

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Authors

TL;DR

Abstract

Table of Contents

Figures (5)