sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging

Jingyuan Chen; Yuan Yao; Mie Anderson; Natalie Hauglund; Celia Kjaerby; Verena Untiet; Maiken Nedergaard; Jiebo Luo

sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging

Jingyuan Chen, Yuan Yao, Mie Anderson, Natalie Hauglund, Celia Kjaerby, Verena Untiet, Maiken Nedergaard, Jiebo Luo

TL;DR

sDREAMER introduces a self-distilled Mixture-of-Modality-Experts transformer for sleep staging that operates on EEG, EMG, or their combination. By employing three parallel modality pathways with partially shared weights and a cross-modal mix pathway, coupled with a self-distillation scheme, it achieves strong performance in both single-channel and multi-channel inference on a mouse dataset. The architecture integrates patch-based input, epoch- and sequence-level MoME transformers, and a KL-divergence based distillation objective to improve cross-modal interactions. Empirical results show clear advantages over baseline transformer and traditional methods, with ablations confirming the value of joint encoding and self-distillation, and visualizations supporting meaningful, aligned latent representations across modalities.

Abstract

Automatic sleep staging based on electroencephalography (EEG) and electromyography (EMG) signals is an important aspect of sleep-related research. Current sleep staging methods suffer from two major drawbacks. First, there are limited information interactions between modalities in the existing methods. Second, current methods do not develop unified models that can handle different sources of input. To address these issues, we propose a novel sleep stage scoring model sDREAMER, which emphasizes cross-modality interaction and per-channel performance. Specifically, we develop a mixture-of-modality-expert (MoME) model with three pathways for EEG, EMG, and mixed signals with partially shared weights. We further propose a self-distillation training scheme for further information interaction across modalities. Our model is trained with multi-channel inputs and can make classifications on either single-channel or multi-channel inputs. Experiments demonstrate that our model outperforms the existing transformer-based sleep scoring methods for multi-channel inference. For single-channel inference, our model also outperforms the transformer-based models trained with single-channel signals.

sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging

TL;DR

Abstract

sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)