Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers

Jathurshan Pradeepkumar; Mithunjha Anandakumar; Vinith Kugathasan; Dhinesh Suntharalingham; Simon L. Kappel; Anjula C. De Silva; Chamira U. S. Edussooriya

Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers

Jathurshan Pradeepkumar, Mithunjha Anandakumar, Vinith Kugathasan, Dhinesh Suntharalingham, Simon L. Kappel, Anjula C. De Silva, Chamira U. S. Edussooriya

TL;DR

The paper tackles sleep stage classification with the goal of interpretability and efficiency. It introduces cross-modal transformers that integrate EEG and EOG through intra- and cross-modal attention, using CLS tokens to produce compact, interpretable representations. Two architectures are proposed: an Epoch Cross-Modal Transformer for one-to-one classification and a Sequence Cross-Modal Transformer for many-to-many classification, both backed by a multi-scale 1D-CNN for feature learning. Across SleepEDF-expanded and SHHS datasets, the Sequence variant matches state-of-the-art accuracy while reducing parameters and training time, and attention-based interpretability provides actionable insights into decision-making. This work advances clinically relevant sleep staging by delivering transparent, efficient models that leverage cross-modal information.

Abstract

Accurate sleep stage classification is significant for sleep health assessment. In recent years, several machine-learning based sleep staging algorithms have been developed , and in particular, deep-learning based algorithms have achieved performance on par with human annotation. Despite improved performance, a limitation of most deep-learning based algorithms is their black-box behavior, which have limited their use in clinical settings. Here, we propose a cross-modal transformer, which is a transformer-based method for sleep stage classification. The proposed cross-modal transformer consists of a novel cross-modal transformer encoder architecture along with a multi-scale one-dimensional convolutional neural network for automatic representation learning. Our method outperforms the state-of-the-art methods and eliminates the black-box behavior of deep-learning models by utilizing the interpretability aspect of the attention modules. Furthermore, our method provides considerable reductions in the number of parameters and training time compared to the state-of-the-art methods. Our code is available at https://github.com/Jathurshan0330/Cross-Modal-Transformer. A demo of our work can be found at https://bit.ly/Cross_modal_transformer_demo.

Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers

TL;DR

Abstract

Paper Structure (30 sections, 5 equations, 6 figures, 9 tables)

This paper contains 30 sections, 5 equations, 6 figures, 9 tables.

Introduction
Related Work
Transformers
Deep Learning Based Sleep Stage Classification
Methodology
Problem Definition
Epoch Cross-Modal Transformer
Multi-Scale 1D-CNN for Representation Learning
Cross-Modal Transformer Encoder and Classification
Sequence Cross-Modal Transformer
Interpretability
Experiments
Dataset
SleepEDF expanded dataset:
SHHS dataset:
...and 15 more sections

Figures (6)

Figure 1: Performance of our cross-modal transformers (in red squares) and other previously reported works (in blue circles) on sleep-EDF-expanded 2018 dataset. Our sequence cross-modal transformer achieves on-par performance with the state-of-the-art, with fourfold reduction in parameters. Here, Seq and CMT refer to Sequence and cross-modal transformer.
Figure 2: The two classification schemesphan2019seqsleepnet used in the domain of sleep staging and in our experiments. In one-to-one classification, the sleep stage of an individual PSG epoch is predicted, whereas in many-to-many classification the sleep stages of multiple epochs are predicted simultaneously.
Figure 3: The architecture of the epoch cross-modal transformer consisting of multi-scale 1D-CNN blocks, intra-modal attention blocks, cross-modal attention block and feed forward networks. (a) shows the overall architecture with two signals as input. (b) visualizes multi-scale 1D-CNN blocks, which consists of three pathways to learn both local and global features. (c) and (d) shows the architectures of the attention blocks and feed forward networks. Here $CLS_{EEG}, CLS_{EOG}$ and $CLS_{Cross}$ are the $CLS$ vectors initiated to learn the aggregated representation of intra-modal relationships of EEG and EOG modalities and cross-modal relationship between EEG and EOG.
Figure 4: The architecture of sequence cross-modal transformer, which is an extension of epoch cross-modal transformer. The sequence cross-modal transformer consists of multiple epoch level blocks to learn the epoch level representation and an additional block to learn inter epoch relationships.
Figure 5: The variants of the epoch level model used to conduct the ablation study. (a) shows the architecture with single channel EEG as the input, (b) shows the extended version of (a) with both EEG and EOG as inputs, and (c) shows a version of the epoch cross-modal transformer without cross-modal attention.
...and 1 more figures

Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers

TL;DR

Abstract

Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (6)