S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models
Tiezhi Wang, Nils Strodthoff
TL;DR
The paper tackles automatic sleep staging from polysomnography by systematically exploring encoder-predictor design choices, emphasizing structured state space models (S4) to capture long-range dependencies. It identifies two robust architectures, S4Sleep(ts) and S4Sleep(spec), that excel across raw time-series and spectrogram inputs, single- and multi-epoch configurations, and on large-scale SHHS1 as well as smaller public datasets. The authors demonstrate statistically significant improvements over state-of-the-art methods on Sleep EDF, MASS-SS3, and SHHS1, with explicit uncertainty estimates and robust generalization without hyperparameter tuning. This work provides a blueprint for architecture search in long time-series annotation tasks and suggests S4-based designs as powerful candidates for clinical and cross-domain time-series analysis, with code and data splits made publicly available.
Abstract
Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these design choices within the broad category of encoder-predictor architectures. We identify robust architectures applicable to both time series and spectrogram input representations. These architectures incorporate structured state space models as integral components and achieve statistically significant performance improvements compared to state-of-the-art approaches on the extensive Sleep Heart Health Study dataset. We anticipate that the architectural insights gained from this study along with the refined methodology for architecture search demonstrated herein will not only prove valuable for future research in sleep staging but also hold relevance for other time series annotation tasks.
