SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures
Keondo Park, Younghoon Na, Yourim Choi, Hyunwoo Ryu, Hyun-Woo Shin, Hyung-Sin Kim
TL;DR
SleepMaMi unifies full-night PSG analysis by integrating micro- and macro-structure through a dual-encoder architecture. The Micro-Encoder uses MAE and multi-modal contrastive objectives to learn fine-grained biosignal morphologies, while the Macro-Encoder leverages bi-directional Mamba layers and Demographic-Guided Contrastive Learning to model global sleep architecture conditioned on demographics. Pretrained on a large, multi-dataset PSG corpus, SleepMaMi achieves state-of-the-art or competitive results across sleep staging, SDB segmentation, and disease prediction, with strong label-efficient performance in few-shot settings. The framework demonstrates robust cross-task generalization and clinically meaningful representations, highlighting its potential as a universal sleep foundation model for clinical and research use.
Abstract
While the shift toward unified foundation models has revolutionized many deep learning domains, sleep medicine remains largely restricted to task-specific models that focus on localized micro-structure features. These approaches often neglect the rich, multi-modal context of Polysomnography (PSG) and fail to capture the global macro-structure of a full night's sleep. To address this, we introduce SleepMaMi , a Sleep Foundation Model engineered to master both hour-long sleep architectures and fine-grained signal morphologies. Our framework utilizes a hierarchical dual-encoder design: a Macro-Encoder to model full-night temporal dependencies and a Micro-Encoder to capture short-term characteristics from biosignals. Macro-Encoder is trained via Demographic-Guided Contrastive Learning, which aligns overnight sleep patterns with objective subject metadata, such as age, sex and BMI to refine global representations. Micro-Encoder is optimized via a hybrid Masked Autoencoder (MAE) and multi-modal contrastive objective. Pre-trained on a massive corpus of $>$20,000 PSG recordings (158K hours),SleepMaMi outperforms existing foundation models across a diverse suite of downstream tasks, demonstrating superior generalizability and label-efficient adaptation for clinical sleep analysis.
