Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers
Jathurshan Pradeepkumar, Mithunjha Anandakumar, Vinith Kugathasan, Dhinesh Suntharalingham, Simon L. Kappel, Anjula C. De Silva, Chamira U. S. Edussooriya
TL;DR
The paper tackles sleep stage classification with the goal of interpretability and efficiency. It introduces cross-modal transformers that integrate EEG and EOG through intra- and cross-modal attention, using CLS tokens to produce compact, interpretable representations. Two architectures are proposed: an Epoch Cross-Modal Transformer for one-to-one classification and a Sequence Cross-Modal Transformer for many-to-many classification, both backed by a multi-scale 1D-CNN for feature learning. Across SleepEDF-expanded and SHHS datasets, the Sequence variant matches state-of-the-art accuracy while reducing parameters and training time, and attention-based interpretability provides actionable insights into decision-making. This work advances clinically relevant sleep staging by delivering transparent, efficient models that leverage cross-modal information.
Abstract
Accurate sleep stage classification is significant for sleep health assessment. In recent years, several machine-learning based sleep staging algorithms have been developed , and in particular, deep-learning based algorithms have achieved performance on par with human annotation. Despite improved performance, a limitation of most deep-learning based algorithms is their black-box behavior, which have limited their use in clinical settings. Here, we propose a cross-modal transformer, which is a transformer-based method for sleep stage classification. The proposed cross-modal transformer consists of a novel cross-modal transformer encoder architecture along with a multi-scale one-dimensional convolutional neural network for automatic representation learning. Our method outperforms the state-of-the-art methods and eliminates the black-box behavior of deep-learning models by utilizing the interpretability aspect of the attention modules. Furthermore, our method provides considerable reductions in the number of parameters and training time compared to the state-of-the-art methods. Our code is available at https://github.com/Jathurshan0330/Cross-Modal-Transformer. A demo of our work can be found at https://bit.ly/Cross_modal_transformer_demo.
