Large Transformers are Better EEG Learners

Bingxin Wang; Xiaowen Fu; Yuan Lan; Luchan Zhang; Wei Zheng; Yang Xiang

Large Transformers are Better EEG Learners

Bingxin Wang, Xiaowen Fu, Yuan Lan, Luchan Zhang, Wei Zheng, Yang Xiang

TL;DR

The paper tackles the scarcity of public EEG data for leveraging large transformer models by introducing AdaCT, plug-and-play adapters that convert EEG time series into formats compatible with pre-trained vision and language transformers. AdaCT-I creates spatio-temporal 2D pseudo-images for fine-tuning ViTs, while AdaCT-T renders short EEG as text for language models like BERT and GPT-2, enabling effective cross-modal transfer learning. Across Epileptic Seizure, Sleep-EDF, and UCI HAR datasets, AdaCT variants outperform strong baselines, with large pre-trained models delivering the best results and clear visualizations showing improved feature separability. This framework broadens the applicability of pre-trained models to EEG and time-series decoding, suggesting practical benefits for interpretability and performance in neuroscience and human activity recognition tasks.

Abstract

Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time series data into spatio-temporal 2D pseudo-images or text forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel time series data into spatio-temporal 2D pseudo-images for fine-tuning pre-trained vision transformers, while AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers. The proposed approach allows for seamless integration of pre-trained vision models and language models in time series decoding tasks, particularly in EEG data analysis. Experimental results on diverse benchmark datasets, including Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the superiority of AdaCT over baseline methods. Overall, we provide a promising transfer learning framework for leveraging the capabilities of pre-trained vision and language models in EEG-based tasks, thereby advancing the field of time series decoding and enhancing interpretability in EEG data analysis. Our code will be available at https://github.com/wangbxj1234/AdaCE.

Large Transformers are Better EEG Learners

TL;DR

Abstract

Paper Structure (35 sections, 7 figures, 5 tables)

This paper contains 35 sections, 7 figures, 5 tables.

Introduction
Related Works
Transformer-Based EEG Decoding Methods
BERT-like Transformers Pre-trained on EEG Datasets
Generative Pre-trained Transformer (GPT)
Pre-trained Vision Transformer (ViT)
AdaCT: Adapters for Converting Time Series Data into Images or Text
AdaCT-I: Adapt Time Series Data into Images
Spatio-Temporal Reshaping of Lengthy Time Series Data
Mapping Spatio-Temporal Dimensions to Image Attributes
Conversion to RGB Format
AdaCT-T: Adapt Time Series Data into Text
Temporal Data Scaling for Signal Representation
Non-overlapping Sliding Window Downsampling
Fine-tune Pre-trained Vision Transformers on Converted Datasets
...and 20 more sections

Figures (7)

Figure 1: Framework: Adapters for converting time series EEG data into images or text for fine-tuning pre-trained large transformers.
Figure 2: Illustration of the AdaCT-I method, showcasing the spatio-temporal reshaping, mapping to image attributes, and conversion to RGB format steps for converting time series data into two-dimensional RGB images.
Figure 3: Illustration of the AdaCT-T method, highlighting the non-overlapping sliding window downsampling step for converting time series data into text representation.
Figure 4: Overview of the fine-tuning process for pre-trained vision transformers and language transformers on converted EEG datasets. The process involves image processing for vision transformers and tokenization for language transformers, followed by integration with pre-trained models and classification head modules.
Figure 5: Fine-tuning Process: Epoch-wise Comparative Analysis of AdaCT-I on UCI HAR Dataset Using Various Pre-trained Vision Models with Baseline (TS-TCC eldele2021time) Accuracy.
...and 2 more figures

Large Transformers are Better EEG Learners

TL;DR

Abstract

Large Transformers are Better EEG Learners

Authors

TL;DR

Abstract

Table of Contents

Figures (7)