Table of Contents
Fetching ...

SleepGMUformer: A gated multimodal temporal neural network for sleep staging

Chenjun Zhao, Xuesen Niu, Xinglin Yu, Long Chen, Na Lv, Huiyu Zhou, Aite Zhao

TL;DR

SleepGMUformer tackles the challenge of heterogeneous multimodal sleep staging by integrating EEG/EOG-based time-frequency representations with wearable sensor time series through a transformer-based per-channel feature extractor and a gated multimodal fusion (GMU) module. The model preprocesses data with EEG de-trending, wearable alignment, and normalization, then learns temporal features per channel before dynamically weighting modalities at the instance level to improve classification. It achieves strong results on SleepEDF-78 and WristHR-Motion-Sleep (approximately 85% and 94.5% accuracy, respectively) and outperforms several baselines, while providing interpretable modality contributions and confidence estimates. The approach demonstrates the feasibility and benefits of combining polysomnography with wearable data for sleep staging, with practical implications for scalable, low-resource sleep monitoring and clinical deployment, though it notes challenges in the N1 stage and opportunities to incorporate more channels and temporal context.

Abstract

Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multidomain sleep data, including heart rate, motion, steps, EEG (Fpz-Cz, Pz-Oz), and EOG from WristHR-Motion-Sleep and SleepEDF-78. The model integrates: 1) a pre-processing module for feature alignment, missing value handling, and EEG de-trending; 2) a feature extraction module for complex sleep features in the time dimension; and 3) a dynamic fusion module for real-time modality weighting.Experiments show classification accuracies of 85.03% on SleepEDF-78 and 94.54% on WristHR-Motion-Sleep datasets. The model handles heterogeneous datasets and outperforms state-of-the-art models by 1.00%-4.00%.

SleepGMUformer: A gated multimodal temporal neural network for sleep staging

TL;DR

SleepGMUformer tackles the challenge of heterogeneous multimodal sleep staging by integrating EEG/EOG-based time-frequency representations with wearable sensor time series through a transformer-based per-channel feature extractor and a gated multimodal fusion (GMU) module. The model preprocesses data with EEG de-trending, wearable alignment, and normalization, then learns temporal features per channel before dynamically weighting modalities at the instance level to improve classification. It achieves strong results on SleepEDF-78 and WristHR-Motion-Sleep (approximately 85% and 94.5% accuracy, respectively) and outperforms several baselines, while providing interpretable modality contributions and confidence estimates. The approach demonstrates the feasibility and benefits of combining polysomnography with wearable data for sleep staging, with practical implications for scalable, low-resource sleep monitoring and clinical deployment, though it notes challenges in the N1 stage and opportunities to incorporate more channels and temporal context.

Abstract

Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multidomain sleep data, including heart rate, motion, steps, EEG (Fpz-Cz, Pz-Oz), and EOG from WristHR-Motion-Sleep and SleepEDF-78. The model integrates: 1) a pre-processing module for feature alignment, missing value handling, and EEG de-trending; 2) a feature extraction module for complex sleep features in the time dimension; and 3) a dynamic fusion module for real-time modality weighting.Experiments show classification accuracies of 85.03% on SleepEDF-78 and 94.54% on WristHR-Motion-Sleep datasets. The model handles heterogeneous datasets and outperforms state-of-the-art models by 1.00%-4.00%.

Paper Structure

This paper contains 18 sections, 10 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Flowchart of sleep staging by integrating SleepGMUformer, including completion of urban and rural sleep data collection and analysis through the use of different devices and data-compatible processing with SleepGMUformer; Cyclic dynamic assessment and analysis of physical therapy effects is completed through the interaction of drug therapy, lifestyle adjustment and sleep staging assessment; Finally, a vision of future applications involving the medical field, the personal care field, and the commercial field.
  • Figure 2: SleepGMUfomer structure. Initially, the raw data are preprocessed by de-trending or alignment, missing value processing, etc. before being input to the single-channel feature extraction module, (a) Transformer Block module for feature extraction, where (b) Multi-head Attention module captures features from different angles by different "heads". Then, the features of each channel are input to the (c) Gated Multimodal Units (GMU) for dynamic multi-channel feature fusion. Finally, MLP is used to complete the classification.
  • Figure 3: Preprocessing procedure on the datasets SleepEDF-78 and WristHR-Motion-Sleep. Preprocessing is performed on each 30-second unit of data. (a): de-trending process for EEG, (b): The wearable device records time-series data within the same 30 seconds, i.e., within the same sleep label, for alignment and missing value processing.
  • Figure 4: Confusion matrix on the dataset SleepEDF-78 and WristHR-Motion-Sleep, with the confusion matrix of SleepEDF-78 on the left and that of WristHR-Motion-Sleep on the right.
  • Figure 5: Hypnogram showing classification accuracy; data from SleepEDF-78 dataset for subject A. Top row: classification results for sleep staging by the model, bottom row: ground truth of sleep stages
  • ...and 5 more figures