Table of Contents
Fetching ...

Large Transformers are Better EEG Learners

Bingxin Wang, Xiaowen Fu, Yuan Lan, Luchan Zhang, Wei Zheng, Yang Xiang

TL;DR

The paper tackles the scarcity of public EEG data for leveraging large transformer models by introducing AdaCT, plug-and-play adapters that convert EEG time series into formats compatible with pre-trained vision and language transformers. AdaCT-I creates spatio-temporal 2D pseudo-images for fine-tuning ViTs, while AdaCT-T renders short EEG as text for language models like BERT and GPT-2, enabling effective cross-modal transfer learning. Across Epileptic Seizure, Sleep-EDF, and UCI HAR datasets, AdaCT variants outperform strong baselines, with large pre-trained models delivering the best results and clear visualizations showing improved feature separability. This framework broadens the applicability of pre-trained models to EEG and time-series decoding, suggesting practical benefits for interpretability and performance in neuroscience and human activity recognition tasks.

Abstract

Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time series data into spatio-temporal 2D pseudo-images or text forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel time series data into spatio-temporal 2D pseudo-images for fine-tuning pre-trained vision transformers, while AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers. The proposed approach allows for seamless integration of pre-trained vision models and language models in time series decoding tasks, particularly in EEG data analysis. Experimental results on diverse benchmark datasets, including Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the superiority of AdaCT over baseline methods. Overall, we provide a promising transfer learning framework for leveraging the capabilities of pre-trained vision and language models in EEG-based tasks, thereby advancing the field of time series decoding and enhancing interpretability in EEG data analysis. Our code will be available at https://github.com/wangbxj1234/AdaCE.

Large Transformers are Better EEG Learners

TL;DR

The paper tackles the scarcity of public EEG data for leveraging large transformer models by introducing AdaCT, plug-and-play adapters that convert EEG time series into formats compatible with pre-trained vision and language transformers. AdaCT-I creates spatio-temporal 2D pseudo-images for fine-tuning ViTs, while AdaCT-T renders short EEG as text for language models like BERT and GPT-2, enabling effective cross-modal transfer learning. Across Epileptic Seizure, Sleep-EDF, and UCI HAR datasets, AdaCT variants outperform strong baselines, with large pre-trained models delivering the best results and clear visualizations showing improved feature separability. This framework broadens the applicability of pre-trained models to EEG and time-series decoding, suggesting practical benefits for interpretability and performance in neuroscience and human activity recognition tasks.

Abstract

Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time series data into spatio-temporal 2D pseudo-images or text forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel time series data into spatio-temporal 2D pseudo-images for fine-tuning pre-trained vision transformers, while AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers. The proposed approach allows for seamless integration of pre-trained vision models and language models in time series decoding tasks, particularly in EEG data analysis. Experimental results on diverse benchmark datasets, including Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the superiority of AdaCT over baseline methods. Overall, we provide a promising transfer learning framework for leveraging the capabilities of pre-trained vision and language models in EEG-based tasks, thereby advancing the field of time series decoding and enhancing interpretability in EEG data analysis. Our code will be available at https://github.com/wangbxj1234/AdaCE.
Paper Structure (35 sections, 7 figures, 5 tables)

This paper contains 35 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Framework: Adapters for converting time series EEG data into images or text for fine-tuning pre-trained large transformers.
  • Figure 2: Illustration of the AdaCT-I method, showcasing the spatio-temporal reshaping, mapping to image attributes, and conversion to RGB format steps for converting time series data into two-dimensional RGB images.
  • Figure 3: Illustration of the AdaCT-T method, highlighting the non-overlapping sliding window downsampling step for converting time series data into text representation.
  • Figure 4: Overview of the fine-tuning process for pre-trained vision transformers and language transformers on converted EEG datasets. The process involves image processing for vision transformers and tokenization for language transformers, followed by integration with pre-trained models and classification head modules.
  • Figure 5: Fine-tuning Process: Epoch-wise Comparative Analysis of AdaCT-I on UCI HAR Dataset Using Various Pre-trained Vision Models with Baseline (TS-TCC eldele2021time) Accuracy.
  • ...and 2 more figures