Table of Contents
Fetching ...

SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures

Keondo Park, Younghoon Na, Yourim Choi, Hyunwoo Ryu, Hyun-Woo Shin, Hyung-Sin Kim

TL;DR

SleepMaMi unifies full-night PSG analysis by integrating micro- and macro-structure through a dual-encoder architecture. The Micro-Encoder uses MAE and multi-modal contrastive objectives to learn fine-grained biosignal morphologies, while the Macro-Encoder leverages bi-directional Mamba layers and Demographic-Guided Contrastive Learning to model global sleep architecture conditioned on demographics. Pretrained on a large, multi-dataset PSG corpus, SleepMaMi achieves state-of-the-art or competitive results across sleep staging, SDB segmentation, and disease prediction, with strong label-efficient performance in few-shot settings. The framework demonstrates robust cross-task generalization and clinically meaningful representations, highlighting its potential as a universal sleep foundation model for clinical and research use.

Abstract

While the shift toward unified foundation models has revolutionized many deep learning domains, sleep medicine remains largely restricted to task-specific models that focus on localized micro-structure features. These approaches often neglect the rich, multi-modal context of Polysomnography (PSG) and fail to capture the global macro-structure of a full night's sleep. To address this, we introduce SleepMaMi , a Sleep Foundation Model engineered to master both hour-long sleep architectures and fine-grained signal morphologies. Our framework utilizes a hierarchical dual-encoder design: a Macro-Encoder to model full-night temporal dependencies and a Micro-Encoder to capture short-term characteristics from biosignals. Macro-Encoder is trained via Demographic-Guided Contrastive Learning, which aligns overnight sleep patterns with objective subject metadata, such as age, sex and BMI to refine global representations. Micro-Encoder is optimized via a hybrid Masked Autoencoder (MAE) and multi-modal contrastive objective. Pre-trained on a massive corpus of $>$20,000 PSG recordings (158K hours),SleepMaMi outperforms existing foundation models across a diverse suite of downstream tasks, demonstrating superior generalizability and label-efficient adaptation for clinical sleep analysis.

SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures

TL;DR

SleepMaMi unifies full-night PSG analysis by integrating micro- and macro-structure through a dual-encoder architecture. The Micro-Encoder uses MAE and multi-modal contrastive objectives to learn fine-grained biosignal morphologies, while the Macro-Encoder leverages bi-directional Mamba layers and Demographic-Guided Contrastive Learning to model global sleep architecture conditioned on demographics. Pretrained on a large, multi-dataset PSG corpus, SleepMaMi achieves state-of-the-art or competitive results across sleep staging, SDB segmentation, and disease prediction, with strong label-efficient performance in few-shot settings. The framework demonstrates robust cross-task generalization and clinically meaningful representations, highlighting its potential as a universal sleep foundation model for clinical and research use.

Abstract

While the shift toward unified foundation models has revolutionized many deep learning domains, sleep medicine remains largely restricted to task-specific models that focus on localized micro-structure features. These approaches often neglect the rich, multi-modal context of Polysomnography (PSG) and fail to capture the global macro-structure of a full night's sleep. To address this, we introduce SleepMaMi , a Sleep Foundation Model engineered to master both hour-long sleep architectures and fine-grained signal morphologies. Our framework utilizes a hierarchical dual-encoder design: a Macro-Encoder to model full-night temporal dependencies and a Micro-Encoder to capture short-term characteristics from biosignals. Macro-Encoder is trained via Demographic-Guided Contrastive Learning, which aligns overnight sleep patterns with objective subject metadata, such as age, sex and BMI to refine global representations. Micro-Encoder is optimized via a hybrid Masked Autoencoder (MAE) and multi-modal contrastive objective. Pre-trained on a massive corpus of 20,000 PSG recordings (158K hours),SleepMaMi outperforms existing foundation models across a diverse suite of downstream tasks, demonstrating superior generalizability and label-efficient adaptation for clinical sleep analysis.
Paper Structure (34 sections, 7 equations, 11 figures, 12 tables)

This paper contains 34 sections, 7 equations, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Overview of SleepMaMi. Full-night PSG recordings are processed through a hierarchical dual-encoder architecture. The Micro-Encoder captures short-term physiological patterns such as K-complexes, sleep spindles, and respiratory events, while the Macro-Encoder models global sleep architecture including sleep cycles and stage distributions across the entire recording. This design supports diverse downstream tasks ranging from fine-grained event segmentation to subject-level clinical outcomes.
  • Figure 2: Micro-Encoder design and pretraining method. The Micro-Encoder adopts a private--shared encoder architecture, incorporating patch merging to improve computational efficiency in the shared encoder. The model is trained with a hybrid objective that combines masked autoencoding (reconstruction) and multi-modal contrastive learning to capture sleep micro-structure. For clarity and space constraints, only three representative modalities are shown.
  • Figure 3: Macro-Encoder design and pretraining method. We utilize bi-directional Mamba layers for efficient long-sequence modeling. Demographic-Guided Contrastive Learning aligns the sleep macro-structure between subjects with objective metadata.
  • Figure 4: Sleep macro-structure variations across demographic groups. Sleep stage distributions over full-night recordings by sex, age (Younger: $<$ 60 yrs; Older: $\geq$ 60 yrs) and BMI. N3 proportion in the early sleep period or REM sleep proportion in later stage varies significantly across groups. These demographic dependent patterns motivate our Demographic-Guided Contrastive Learning objective.
  • Figure 5: Few shot evaluation. Accuracy and Macro-F1 is measured with varying number of finetuning samples. SleepMaMi shows label-efficient adaptation to unseen dataset thanks to well-generalized embeddings derived from large-scale pretraining data.
  • ...and 6 more figures