Table of Contents
Fetching ...

wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals

Jonathan F. Carter, Lionel Tarassenko

TL;DR

Wav2sleep is introduced, a unified model designed to operate on variable sets of input signals during training and inference that outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.

Abstract

Accurate classification of sleep stages from less obtrusive sensor measurements such as the electrocardiogram (ECG) or photoplethysmogram (PPG) could enable important applications in sleep medicine. Existing approaches to this problem have typically used deep learning models designed and trained to operate on one or more specific input signals. However, the datasets used to develop these models often do not contain the same sets of input signals. Some signals, particularly PPG, are much less prevalent than others, and this has previously been addressed with techniques such as transfer learning. Additionally, only training on one or more fixed modalities precludes cross-modal information transfer from other sources, which has proved valuable in other problem domains. To address this, we introduce wav2sleep, a unified model designed to operate on variable sets of input signals during training and inference. After jointly training on over 10,000 overnight recordings from six publicly available polysomnography datasets, including SHHS and MESA, wav2sleep outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.

wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals

TL;DR

Wav2sleep is introduced, a unified model designed to operate on variable sets of input signals during training and inference that outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.

Abstract

Accurate classification of sleep stages from less obtrusive sensor measurements such as the electrocardiogram (ECG) or photoplethysmogram (PPG) could enable important applications in sleep medicine. Existing approaches to this problem have typically used deep learning models designed and trained to operate on one or more specific input signals. However, the datasets used to develop these models often do not contain the same sets of input signals. Some signals, particularly PPG, are much less prevalent than others, and this has previously been addressed with techniques such as transfer learning. Additionally, only training on one or more fixed modalities precludes cross-modal information transfer from other sources, which has proved valuable in other problem domains. To address this, we introduce wav2sleep, a unified model designed to operate on variable sets of input signals during training and inference. After jointly training on over 10,000 overnight recordings from six publicly available polysomnography datasets, including SHHS and MESA, wav2sleep outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.

Paper Structure

This paper contains 27 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of wav2sleep. The model operates on sets of time-series signals $\bm{X}_{1:T}$ to classify sleep stage sequences $y_{1:T}$. This enables it to be jointly trained on heterogeneous datasets, with different available signals, which are especially common in the healthcare domain. At inference time, the same model can be applied to any subset of the signals seen during training.
  • Figure 2: wav2sleep architecture for sets of signals. (a) Each input signal $x^{i}_{1:kT}$ from modality $i\in\mathcal{S}$ is passed to a CNN to form a sequence of feature vectors $\bm{z}^i_{1:T}$. (b) For each time-step $t$, a transformer encoder turns the set of features into a single aggregate feature vector $\bm{z}_t$ using a CLS token devlin_bert_2019. (c) A dilated CNN mixes sequential information to classify sleep stage output sequences $y_{1:T}$
  • Figure 3: Stochastic masking. During training, we sample a random subset of the available modalities for each night of data. To retain a fixed batch shape, we pad unavailable modalities and apply a mask to the self-attention matrices of the epoch mixer.
  • Figure 4: Sleep stage confusion matrices for varying $\mathcal{S}_{\text{Test}}$ on the Census test set.
  • Figure 5: Performance ($\kappa_{T}$) of wav2sleep against age for varying $\mathcal{S}_{\text{Test}}$ on the Census dataset.
  • ...and 1 more figures