Table of Contents
Fetching ...

PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning

Yifei Wang, Qi Liu, Fuli Min, Honghao Wang

TL;DR

PSG-MAE addresses data scarcity and cross-dataset generalization in automated sleep monitoring by pre-training a robust PSG encoder via self-supervised learning. It introduces complementary masking across channels, a channel-level reconstruction loss combining cosine similarity with MSE, and inter-channel contrastive learning to capture temporal and inter-channel relationships. When fine-tuned with downstream feature decomposers, the pre-trained encoder achieves strong sleep staging and obstructive sleep apnea detection performance, demonstrating robustness across datasets. This framework enables multitask sleep assessment from multichannel PSG data using unlabeled records, offering a scalable approach for comprehensive sleep analysis.

Abstract

Polysomnography (PSG) signals are essential for studying sleep processes and diagnosing sleep disorders. Analyzing PSG data through deep neural networks (DNNs) for automated sleep monitoring has become increasingly feasible. However, the limited availability of datasets for certain sleep events often leads to DNNs focusing on a single task with a single-sourced training dataset. As a result, these models struggle to transfer to new sleep events and lack robustness when applied to new datasets. To address these challenges, we propose PSG-MAE, a mask autoencoder (MAE) based pre-training framework. By performing self-supervised learning on a large volume of unlabeled PSG data, PSG-MAE develops a robust feature extraction network that can be broadly applied to various sleep event monitoring tasks. Unlike conventional MAEs, PSG-MAE generates complementary masks across PSG channels, integrates a multichannel signal reconstruction method, and employs a self-supervised inter-channel contrastive learning (ICCL) strategy. This approach enables the encoder to capture temporal features from each channel while simultaneously learning latent relationships between channels, thereby enhancing the utilization of multichannel information. Experimental results show that PSG-MAE effectively captures both temporal details and inter-channel information from PSG signals. When the encoder pre-trained through PSG-MAE is fine-tuned with downstream feature decomposition networks, it achieves an accuracy of 83.7% for sleep staging and 90.45% for detecting obstructive sleep apnea, which highlights the framework's robustness and broad applicability.

PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning

TL;DR

PSG-MAE addresses data scarcity and cross-dataset generalization in automated sleep monitoring by pre-training a robust PSG encoder via self-supervised learning. It introduces complementary masking across channels, a channel-level reconstruction loss combining cosine similarity with MSE, and inter-channel contrastive learning to capture temporal and inter-channel relationships. When fine-tuned with downstream feature decomposers, the pre-trained encoder achieves strong sleep staging and obstructive sleep apnea detection performance, demonstrating robustness across datasets. This framework enables multitask sleep assessment from multichannel PSG data using unlabeled records, offering a scalable approach for comprehensive sleep analysis.

Abstract

Polysomnography (PSG) signals are essential for studying sleep processes and diagnosing sleep disorders. Analyzing PSG data through deep neural networks (DNNs) for automated sleep monitoring has become increasingly feasible. However, the limited availability of datasets for certain sleep events often leads to DNNs focusing on a single task with a single-sourced training dataset. As a result, these models struggle to transfer to new sleep events and lack robustness when applied to new datasets. To address these challenges, we propose PSG-MAE, a mask autoencoder (MAE) based pre-training framework. By performing self-supervised learning on a large volume of unlabeled PSG data, PSG-MAE develops a robust feature extraction network that can be broadly applied to various sleep event monitoring tasks. Unlike conventional MAEs, PSG-MAE generates complementary masks across PSG channels, integrates a multichannel signal reconstruction method, and employs a self-supervised inter-channel contrastive learning (ICCL) strategy. This approach enables the encoder to capture temporal features from each channel while simultaneously learning latent relationships between channels, thereby enhancing the utilization of multichannel information. Experimental results show that PSG-MAE effectively captures both temporal details and inter-channel information from PSG signals. When the encoder pre-trained through PSG-MAE is fine-tuned with downstream feature decomposition networks, it achieves an accuracy of 83.7% for sleep staging and 90.45% for detecting obstructive sleep apnea, which highlights the framework's robustness and broad applicability.

Paper Structure

This paper contains 16 sections, 20 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Polysomnography (PSG) of one sleep epoch (30s), during which arousal occurs, marked in the dashed box. A sleep event during sleep often induces abrupt fluctuations in multiple channels of the PSG signals. Integrating the variations across different channels can help improve the accuracy of sleep event monitoring.
  • Figure 2: The framework of PSG-MAE: The original PSG signal is divided into subsegments along the time dimension, followed by the application of complementary masks across the channel dimension. After passing through the encoder-decoder network, the unmasked portions of the signal are reconstructed, with the channel-level reconstruction loss facilitating the learning of temporal features in the original signal. In the pair of reconstructed PSG signals, one sub-segment is treated as an anchor, whose corresponding sub-segment in the other signal is considered as a positive sample, while the remaining subsegments are negative samples. ICCL is then applied to learn the intrinsic relationships between different channels by maximizing the distance of positive pairs and minimizing that of negative ones.
  • Figure 3: Basic structure of downstream sleep events monitoring network.
  • Figure 4: The signal reconstruction results of PSG-MAE pre-training, show that the channel-level signal reconstruction loss and ICCL enable the framework to learn fine-grained temporal information within PSG channels as well as interaction information between channels. In contrast, without ICCL, it becomes difficult to disentangle individual channel information from the fused multichannel features.
  • Figure 5: UMAP visualization of PSG-MAE extracted features before and after downstream task training: (a) and (b) for sleep staging, (c) and (d) for OSA detection.