Table of Contents
Fetching ...

Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

Tim Bary, Benoit Macq

TL;DR

The paper tackles the scarcity of labeled EEG data hindering transformer-based EEG classification. It introduces self-supervised pre-training datasets designed from unlabeled EEG signals using three alteration-based tasks, evaluated on an EO/EC benchmark with a Multi-channel Vision Transformer and on a seizure-forecasting task with TUSZ. Results show that pre-training speeds up fine-tuning and improves accuracy and AUC, with channel shuffling emerging as the most effective pre-training method. This work demonstrates that unlabeled EEG data can be leveraged to train more capable and data-efficient Transformer models, with practical impact for clinical and research EEG applications. Code to generate the pre-training datasets is released for reproducibility and further exploration.

Abstract

Transformer neural networks require a large amount of labeled data to train effectively. Such data is often scarce in electroencephalography, as annotations made by medical experts are costly. This is why self-supervised training, using unlabeled data, has to be performed beforehand. In this paper, we present a way to design several labeled datasets from unlabeled electroencephalogram (EEG) data. These can then be used to pre-train transformers to learn representations of EEG signals. We tested this method on an epileptic seizure forecasting task on the Temple University Seizure Detection Corpus using a Multi-channel Vision Transformer. Our results suggest that 1) Models pre-trained using our approach demonstrate significantly faster training times, reducing fine-tuning duration by more than 50% for the specific task, and 2) Pre-trained models exhibit improved accuracy, with an increase from 90.93% to 92.16%, as well as a higher AUC, rising from 0.9648 to 0.9702 when compared to non-pre-trained models.

Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

TL;DR

The paper tackles the scarcity of labeled EEG data hindering transformer-based EEG classification. It introduces self-supervised pre-training datasets designed from unlabeled EEG signals using three alteration-based tasks, evaluated on an EO/EC benchmark with a Multi-channel Vision Transformer and on a seizure-forecasting task with TUSZ. Results show that pre-training speeds up fine-tuning and improves accuracy and AUC, with channel shuffling emerging as the most effective pre-training method. This work demonstrates that unlabeled EEG data can be leveraged to train more capable and data-efficient Transformer models, with practical impact for clinical and research EEG applications. Code to generate the pre-training datasets is released for reproducibility and further exploration.

Abstract

Transformer neural networks require a large amount of labeled data to train effectively. Such data is often scarce in electroencephalography, as annotations made by medical experts are costly. This is why self-supervised training, using unlabeled data, has to be performed beforehand. In this paper, we present a way to design several labeled datasets from unlabeled electroencephalogram (EEG) data. These can then be used to pre-train transformers to learn representations of EEG signals. We tested this method on an epileptic seizure forecasting task on the Temple University Seizure Detection Corpus using a Multi-channel Vision Transformer. Our results suggest that 1) Models pre-trained using our approach demonstrate significantly faster training times, reducing fine-tuning duration by more than 50% for the specific task, and 2) Pre-trained models exhibit improved accuracy, with an increase from 90.93% to 92.16%, as well as a higher AUC, rising from 0.9648 to 0.9702 when compared to non-pre-trained models.

Paper Structure

This paper contains 21 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Visualisation of the three EEG signals alterations for the proposed pre-training tasks. Alteration #1 replaces channels of the EEG with white noise, alteration #2 shuffles the order of the channels, and alteration #3 mixes two randomly paired EEG samples together by replacing the channels of the first sample by the channels of the second one and vice versa.
  • Figure 2: Schematic representation of the MViT architecture for EEG classification. The model ingests scalograms of individual EEG channels, processes them through parallel encoders, and fuses their features for final classification via a MLP. Scalograms are generated using the CWT to provide a time-frequency representation of the EEG data.
  • Figure 3: Pre-training performances assessment methodology on the designed pre-training datasets. Starting with five copies of a same untrained model, four of them are pre-trained separately with one of the proposed method for 40 epochs, while the fifth model is not pre-trained. All models are then fine-tuned on the specific task. This operation is repeated $N$ times (in this paper, $N=17$) to account for the variability between the experiments.
  • Figure 4: Box plots of the EOC (top) and the minimum validation loss (bottom) for each of the proposed pre-training methods. The significance levels are: $p< 0.05$ (*), $p<0.01$ (**), $p<10^{-3}$ (***), and $p<10^{-4}$ (****) (not all relations of significance are shown).