Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

Tim Bary; Benoit Macq

Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

Tim Bary, Benoit Macq

TL;DR

The paper tackles the scarcity of labeled EEG data hindering transformer-based EEG classification. It introduces self-supervised pre-training datasets designed from unlabeled EEG signals using three alteration-based tasks, evaluated on an EO/EC benchmark with a Multi-channel Vision Transformer and on a seizure-forecasting task with TUSZ. Results show that pre-training speeds up fine-tuning and improves accuracy and AUC, with channel shuffling emerging as the most effective pre-training method. This work demonstrates that unlabeled EEG data can be leveraged to train more capable and data-efficient Transformer models, with practical impact for clinical and research EEG applications. Code to generate the pre-training datasets is released for reproducibility and further exploration.

Abstract

Transformer neural networks require a large amount of labeled data to train effectively. Such data is often scarce in electroencephalography, as annotations made by medical experts are costly. This is why self-supervised training, using unlabeled data, has to be performed beforehand. In this paper, we present a way to design several labeled datasets from unlabeled electroencephalogram (EEG) data. These can then be used to pre-train transformers to learn representations of EEG signals. We tested this method on an epileptic seizure forecasting task on the Temple University Seizure Detection Corpus using a Multi-channel Vision Transformer. Our results suggest that 1) Models pre-trained using our approach demonstrate significantly faster training times, reducing fine-tuning duration by more than 50% for the specific task, and 2) Pre-trained models exhibit improved accuracy, with an increase from 90.93% to 92.16%, as well as a higher AUC, rising from 0.9648 to 0.9702 when compared to non-pre-trained models.

Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

TL;DR

Abstract

Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)