Table of Contents
Fetching ...

OSF: On Pre-training and Scaling of Sleep Foundation Models

Zitao Shuai, Zongzhe Xu, David Yang, Wei Wang, Yuzhe Yang

TL;DR

OSF is introduced, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks and reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling.

Abstract

Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack an in-depth understanding of the pre-training process and scaling patterns that lead to more generalizable sleep FMs. To fill this gap, we curate a massive corpus of 166,500 hours of sleep recordings from nine public sources and establish SleepBench, a comprehensive, fully open-source benchmark. Leveraging SleepBench, we systematically evaluate four families of self-supervised pre-training objectives and uncover three critical findings: (1) existing FMs fail to generalize to missing channels at inference; (2) channel-invariant feature learning is essential for pre-training; and (3) scaling sample size, model capacity, and multi-source data mixture consistently improves downstream performance.With an enhanced pre-training and scaling recipe, we introduce OSF, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks. Further analysis of OSF also reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling.

OSF: On Pre-training and Scaling of Sleep Foundation Models

TL;DR

OSF is introduced, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks and reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling.

Abstract

Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack an in-depth understanding of the pre-training process and scaling patterns that lead to more generalizable sleep FMs. To fill this gap, we curate a massive corpus of 166,500 hours of sleep recordings from nine public sources and establish SleepBench, a comprehensive, fully open-source benchmark. Leveraging SleepBench, we systematically evaluate four families of self-supervised pre-training objectives and uncover three critical findings: (1) existing FMs fail to generalize to missing channels at inference; (2) channel-invariant feature learning is essential for pre-training; and (3) scaling sample size, model capacity, and multi-source data mixture consistently improves downstream performance.With an enhanced pre-training and scaling recipe, we introduce OSF, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks. Further analysis of OSF also reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling.
Paper Structure (30 sections, 13 figures, 32 tables)

This paper contains 30 sections, 13 figures, 32 tables.

Figures (13)

  • Figure 1: Performance comparison across downstream tasks.OSF consistently achieves state-of-the-art on downstream tasks.
  • Figure 2: Distribution of our established SleepBench.
  • Figure 3: Inference with full versus missing channels. Existing sleep FM fails on inference time missing channel samples.
  • Figure 4: Illustration of considered augmentations. We consider time-wise masking and channel masking strategies.
  • Figure 5: Comparison of scaling behavior. Current sleep FM is less scalable. In contrast, OSF better utilizes training data.
  • ...and 8 more figures