MC2SleepNet: Multi-modal Cross-masking with Contrastive Learning for Sleep Stage Classification
Younghoon Na, Hyun Keun Ahn, Hyun-Kyung Lee, Yoongeol Lee, Seung Hun Oh, Hongkwon Kim, Jeong-Gun Lee
TL;DR
This paper introduces MC2SleepNet, a multi-modal sleep stage classifier that jointly processes raw EEG and spectrogram inputs using a CNN and a Transformer backbone, respectively. It advances multi-modal learning through epoch-level InfoNCE contrastive alignment and a novel sequence-level Cross-Masking that enables cross-attention between modalities, followed by a fine-tuning stage with frozen backbones. The model achieves state-of-the-art accuracy on SleepEDF-78 (84.6%) and SHHS (88.6%), demonstrating strong generalization across dataset sizes and improved performance on challenging sleep stages. The approach offers a scalable framework for robust sleep staging with cross-modal supervision and self-supervised learning, potentially improving automated PSG analysis in clinical settings.
Abstract
Sleep profoundly affects our health, and sleep deficiency or disorders can cause physical and mental problems. Despite significant findings from previous studies, challenges persist in optimizing deep learning models, especially in multi-modal learning for high-accuracy sleep stage classification. Our research introduces MC2SleepNet (Multi-modal Cross-masking with Contrastive learning for Sleep stage classification Network). It aims to facilitate the effective collaboration between Convolutional Neural Networks (CNNs) and Transformer architectures for multi-modal training with the help of contrastive learning and cross-masking. Raw single channel EEG signals and corresponding spectrogram data provide differently characterized modalities for multi-modal learning. Our MC2SleepNet has achieved state-of-the-art performance with an accuracy of both 84.6% on the SleepEDF-78 and 88.6% accuracy on the Sleep Heart Health Study (SHHS). These results demonstrate the effective generalization of our proposed network across both small and large datasets.
