Table of Contents
Fetching ...

Multimodal Sleep Stage and Sleep Apnea Classification Using Vision Transformer: A Multitask Explainable Learning Approach

Kianoosh Kazemi, Iman Azimi, Michelle Khine, Rami N. Khayat, Amir M. Rahmani, Pasi Liljeberg

TL;DR

The paper tackles the challenge of simultaneous sleep stage and sleep apnea classification using multimodal wearable signals. It introduces a 1D-Vision Transformer within a multitask learning framework to exploit shared patterns across sleep stages and disorders while providing explainability through attention analysis. Using a four-channel dataset from 123 subjects, the model achieves 78% accuracy for five-stage sleep classification and 74% accuracy for sleep apnea classification, with interpretability results showing attention aligned to physiologically meaningful respiratory events. The approach holds promise for noninvasive, home-based sleep monitoring by delivering accurate, interpretable multitask predictions without relying solely on PSG laboratory testing.

Abstract

Sleep is an essential component of human physiology, contributing significantly to overall health and quality of life. Accurate sleep staging and disorder detection are crucial for assessing sleep quality. Studies in the literature have proposed PSG-based approaches and machine-learning methods utilizing single-modality signals. However, existing methods often lack multimodal, multilabel frameworks and address sleep stages and disorders classification separately. In this paper, we propose a 1D-Vision Transformer for simultaneous classification of sleep stages and sleep disorders. Our method exploits the sleep disorders' correlation with specific sleep stage patterns and performs a simultaneous identification of a sleep stage and sleep disorder. The model is trained and tested using multimodal-multilabel sensory data (including photoplethysmogram, respiratory flow, and respiratory effort signals). The proposed method shows an overall accuracy (cohen's Kappa) of 78% (0.66) for five-stage sleep classification and 74% (0.58) for sleep apnea classification. Moreover, we analyzed the encoder attention weights to clarify our models' predictions and investigate the influence different features have on the models' outputs. The result shows that identified patterns, such as respiratory troughs and peaks, make a higher contribution to the final classification process.

Multimodal Sleep Stage and Sleep Apnea Classification Using Vision Transformer: A Multitask Explainable Learning Approach

TL;DR

The paper tackles the challenge of simultaneous sleep stage and sleep apnea classification using multimodal wearable signals. It introduces a 1D-Vision Transformer within a multitask learning framework to exploit shared patterns across sleep stages and disorders while providing explainability through attention analysis. Using a four-channel dataset from 123 subjects, the model achieves 78% accuracy for five-stage sleep classification and 74% accuracy for sleep apnea classification, with interpretability results showing attention aligned to physiologically meaningful respiratory events. The approach holds promise for noninvasive, home-based sleep monitoring by delivering accurate, interpretable multitask predictions without relying solely on PSG laboratory testing.

Abstract

Sleep is an essential component of human physiology, contributing significantly to overall health and quality of life. Accurate sleep staging and disorder detection are crucial for assessing sleep quality. Studies in the literature have proposed PSG-based approaches and machine-learning methods utilizing single-modality signals. However, existing methods often lack multimodal, multilabel frameworks and address sleep stages and disorders classification separately. In this paper, we propose a 1D-Vision Transformer for simultaneous classification of sleep stages and sleep disorders. Our method exploits the sleep disorders' correlation with specific sleep stage patterns and performs a simultaneous identification of a sleep stage and sleep disorder. The model is trained and tested using multimodal-multilabel sensory data (including photoplethysmogram, respiratory flow, and respiratory effort signals). The proposed method shows an overall accuracy (cohen's Kappa) of 78% (0.66) for five-stage sleep classification and 74% (0.58) for sleep apnea classification. Moreover, we analyzed the encoder attention weights to clarify our models' predictions and investigate the influence different features have on the models' outputs. The result shows that identified patterns, such as respiratory troughs and peaks, make a higher contribution to the final classification process.

Paper Structure

This paper contains 15 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Data Preparation piepline. The sleep data comprises four channels, representing PPG, respiratory flow signal, chest and abdomen respiratory effort signals. The Signals were segmented into 30-s parts, followed by Z-score normalization on all channels within each subject. Then, an inter-patient test carried out. Next, down-sampling from 512 Hz to 64 Hz was performed on all the signals. Subsequently, the channel concatenation is applied to stack all four channels together.
  • Figure 2: Illustration of the model architecture. The input signals are divided into patches of a specific size, then linearly embed each patch, incorporate position embeddings, and finally input the sequence of resulting vectors into a standard Transformer encoder. In order to perform classification, an MLP layer is added to the output from the transformer encoder and applies additional transformations to prepare the data for the final classification step. In the final step, two branches of the dense layer are added to perform task-specific (i.e., sleep staging and sleep disorder) classification.
  • Figure 3: Confusion Matrix for sleep stage classification.
  • Figure 4: Confusion Matrix for sleep disorder classification.
  • Figure 5: Attention Maps plotted on top of respiratory signal for different sleep disorder occurrence (a) Obstructive Hypopnea, (b) Obstructive Hypopnea, (c) Obstructive Hypopnea, (d) Central Apnea and (e) Obstructive Sleep Apnea. The high score attentions are colored in darker red, and the actual respiration signal is illustrated in solid blue line. The sleep disturbance is shadowed in blue.