Table of Contents
Fetching ...

Learning the relative composition of EEG signals using pairwise relative shift pretraining

Christopher Sandino, Sayeri Lala, Geeling Chau, Melika Ayoughi, Behrooz Mahasseni, Ellen Zippi, Ali Moin, Erdrin Azemi, Hanlin Goh

TL;DR

The paper addresses EEG self-supervised representation learning by introducing PARS, a pretraining objective that predicts pairwise relative temporal shifts between randomly sampled EEG patches. By using a cross-attention decoder to infer a relative shift matrix $\theta$, PARS emphasizes temporal composition and long-range dependencies beyond local reconstructions. Across label-efficient and transfer-learning evaluations, PARS consistently outperforms reconstruction- and position-prediction-based baselines, particularly in low-label regimes, and shows strong generalization across sleep staging, abnormality detection, seizure detection, and motor imagery tasks. The work establishes a new paradigm for EEG SSL and motivates future hybrid approaches that combine local and global temporal cues for robust neuroscience-associated decoding.

Abstract

Self-supervised learning (SSL) offers a promising approach for learning electroencephalography (EEG) representations from unlabeled data, reducing the need for expensive annotations for clinical applications like sleep staging and seizure detection. While current EEG SSL methods predominantly use masked reconstruction strategies like masked autoencoders (MAE) that capture local temporal patterns, position prediction pretraining remains underexplored despite its potential to learn long-range dependencies in neural signals. We introduce PAirwise Relative Shift or PARS pretraining, a novel pretext task that predicts relative temporal shifts between randomly sampled EEG window pairs. Unlike reconstruction-based methods that focus on local pattern recovery, PARS encourages encoders to capture relative temporal composition and long-range dependencies inherent in neural signals. Through comprehensive evaluation on various EEG decoding tasks, we demonstrate that PARS-pretrained transformers consistently outperform existing pretraining strategies in label-efficient and transfer learning settings, establishing a new paradigm for self-supervised EEG representation learning.

Learning the relative composition of EEG signals using pairwise relative shift pretraining

TL;DR

The paper addresses EEG self-supervised representation learning by introducing PARS, a pretraining objective that predicts pairwise relative temporal shifts between randomly sampled EEG patches. By using a cross-attention decoder to infer a relative shift matrix , PARS emphasizes temporal composition and long-range dependencies beyond local reconstructions. Across label-efficient and transfer-learning evaluations, PARS consistently outperforms reconstruction- and position-prediction-based baselines, particularly in low-label regimes, and shows strong generalization across sleep staging, abnormality detection, seizure detection, and motor imagery tasks. The work establishes a new paradigm for EEG SSL and motivates future hybrid approaches that combine local and global temporal cues for robust neuroscience-associated decoding.

Abstract

Self-supervised learning (SSL) offers a promising approach for learning electroencephalography (EEG) representations from unlabeled data, reducing the need for expensive annotations for clinical applications like sleep staging and seizure detection. While current EEG SSL methods predominantly use masked reconstruction strategies like masked autoencoders (MAE) that capture local temporal patterns, position prediction pretraining remains underexplored despite its potential to learn long-range dependencies in neural signals. We introduce PAirwise Relative Shift or PARS pretraining, a novel pretext task that predicts relative temporal shifts between randomly sampled EEG window pairs. Unlike reconstruction-based methods that focus on local pattern recovery, PARS encourages encoders to capture relative temporal composition and long-range dependencies inherent in neural signals. Through comprehensive evaluation on various EEG decoding tasks, we demonstrate that PARS-pretrained transformers consistently outperform existing pretraining strategies in label-efficient and transfer learning settings, establishing a new paradigm for self-supervised EEG representation learning.

Paper Structure

This paper contains 30 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: PARS Pretraining Overview. In PARS pretraining, $N$ patches are randomly sampled from a single-channel EEG sequence ($N=5$ is shown here for simplicity). Each patch is tokenized by a linear layer, and then positional embeddings are added to a subset of tokens. A learnable positional mask token (denoted by $M$) is added to the rest of the tokens. Tokens are embedded using a transformer encoder and then decoded by a cross-attention layer to estimate the distance matrix $\theta$ containing the temporal shifts between pairs of patches.
  • Figure 2: Multi-channel Fine-tuning. To adapt the PARS pretrained model for multi-channel EEG data, each channel of EEG is embedded using the tokenizer and single-channel transformer encoder learned during pretraining. The per-channel embeddings are average pooled along the temporal dimension to generate a series of spatial tokens. The spatial tokens are collapsed into a final token by applying a cross attention layer between them and a learnable query token. That final token is then converted into predictions by a linear layer.
  • Figure 3: Label Efficiency Experiment. Clinical sleep staging (YSYW) results are shown for no pretraining (random init) and with pretraining using MP3, DropPos, MAE, and PARS. Each pretrained model is fine-tuned on a downsampled number of subjects depicted on the x-axis. Cohen's Kappa, F1-score, and balanced accuracy are reported on a test set of held out patients. The standard deviation across five random seeds is reported behind line plots for each approach.
  • Figure 4: Transformer encoder architecture. This figure depicts the transformer encoder model architecture used for all pretraining experiments in the paper.
  • Figure 5: A comparison of pretraining strategies.
  • ...and 3 more figures