Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability
Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Raju Surampudi Bapi
TL;DR
This work tackles automatic sleep-stage classification from single-channel EEG by introducing an end-to-end deep learning framework that combines SE-ResNet-based feature extraction with stacked Bi-LSTM temporal context modeling. It advances interpretability through 1D-GradCAM visualizations that align with sleep experts' insights, and demonstrates an 8x training-time improvement via a stride-based efficiency strategy. Evaluations on SleepEDF-20, SleepEDF-78, and SHHS show superior macro-F1 and accuracy compared with state-of-the-art baselines, establishing a practical, clinically relevant approach. The combination of robust performance and explainability supports potential deployment in clinical sleep assessment and monitoring, with future work focusing on improving N1 detection.
Abstract
Automated Sleep stage classification using raw single channel EEG is a critical tool for sleep quality assessment and disorder diagnosis. However, modelling the complexity and variability inherent in this signal is a challenging task, limiting their practicality and effectiveness in clinical settings. To mitigate these challenges, this study presents an end-to-end deep learning (DL) model which integrates squeeze and excitation blocks within the residual network to extract features and stacked Bi-LSTM to understand complex temporal dependencies. A distinctive aspect of this study is the adaptation of GradCam for sleep staging, marking the first instance of an explainable DL model in this domain with alignment of its decision-making with sleep expert's insights. We evaluated our model on the publically available datasets (SleepEDF-20, SleepEDF-78, and SHHS), achieving Macro-F1 scores of 82.5, 78.9, and 81.9, respectively. Additionally, a novel training efficiency enhancement strategy was implemented by increasing stride size, leading to 8x faster training times with minimal impact on performance. Comparative analyses underscore our model outperforms all existing baselines, indicating its potential for clinical usage.
