SleepGMUformer: A gated multimodal temporal neural network for sleep staging
Chenjun Zhao, Xuesen Niu, Xinglin Yu, Long Chen, Na Lv, Huiyu Zhou, Aite Zhao
TL;DR
SleepGMUformer tackles the challenge of heterogeneous multimodal sleep staging by integrating EEG/EOG-based time-frequency representations with wearable sensor time series through a transformer-based per-channel feature extractor and a gated multimodal fusion (GMU) module. The model preprocesses data with EEG de-trending, wearable alignment, and normalization, then learns temporal features per channel before dynamically weighting modalities at the instance level to improve classification. It achieves strong results on SleepEDF-78 and WristHR-Motion-Sleep (approximately 85% and 94.5% accuracy, respectively) and outperforms several baselines, while providing interpretable modality contributions and confidence estimates. The approach demonstrates the feasibility and benefits of combining polysomnography with wearable data for sleep staging, with practical implications for scalable, low-resource sleep monitoring and clinical deployment, though it notes challenges in the N1 stage and opportunities to incorporate more channels and temporal context.
Abstract
Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multidomain sleep data, including heart rate, motion, steps, EEG (Fpz-Cz, Pz-Oz), and EOG from WristHR-Motion-Sleep and SleepEDF-78. The model integrates: 1) a pre-processing module for feature alignment, missing value handling, and EEG de-trending; 2) a feature extraction module for complex sleep features in the time dimension; and 3) a dynamic fusion module for real-time modality weighting.Experiments show classification accuracies of 85.03% on SleepEDF-78 and 94.54% on WristHR-Motion-Sleep datasets. The model handles heterogeneous datasets and outperforms state-of-the-art models by 1.00%-4.00%.
