Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

Eric Modesitt; Haicheng Yin; Williams Huang Wang; Brian Lu

Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

Eric Modesitt, Haicheng Yin, Williams Huang Wang, Brian Lu

TL;DR

This work addresses the need for fast and accurate EEG regression in Brain-Computer Interfaces by fusing pretrained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet). The EEGViT-TCNet architecture bridges TCNet's temporal feature extraction with ViT's attention-based processing through two bridging convolutional layers and 1D patch embeddings on a pretrained ViT, trained with a 70/30 train/validation split and RMSE as the primary metric. On EEGEyeNet's Absolute Position Task, the model achieves a RMSE of $51.8$ mm, a $6.5\%$ improvement over the $55.4$ mm EEGViT baseline, and a speedup of up to $4.32\times$, outperforming traditional methods. Ablation studies show the contributions of the bridging convolutions, TCNet dropout settings, and pretrained ViT, and the work points to future explorations in interpretability and scalability across diverse EEG datasets.

Abstract

The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of developing robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression. The core of this approach lies in harnessing the sequential data processing strengths of ViTs along with the superior feature extraction capabilities of TCNet, to significantly improve EEG analysis accuracy. In addition, we analyze the importance of how to construct optimal patches for the attention mechanism to analyze, balancing both speed and accuracy tradeoffs. Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8 on EEGEyeNet's Absolute Position Task, outperforming existing state-of-the-art models. Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster). This breakthrough not only sets a new benchmark in EEG regression analysis but also opens new avenues for future research in the integration of transformer architectures with specialized feature extraction methods for diverse EEG datasets.

Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

TL;DR

mm, a

improvement over the

mm EEGViT baseline, and a speedup of up to

, outperforming traditional methods. Ablation studies show the contributions of the bridging convolutions, TCNet dropout settings, and pretrained ViT, and the work points to future explorations in interpretability and scalability across diverse EEG datasets.

Abstract

Paper Structure (20 sections, 3 figures, 2 tables)

This paper contains 20 sections, 3 figures, 2 tables.

Introduction
Related Work
Deep Learning in EEG
ViTs in Non-Image Data Analysis
Temporal Convolutional Networks (TCNet)
Methods
EEGEyeNet Dataset
EEGViT-TCNet Model Architecture
Temporal Convolutional Network (TCNet) Component:
Convolutional and Batch Normalization Layers:
Vision Transformer (ViT) Component:
Training and Evaluation Procedure
Results
Performance Benchmarking
Ablation Studies
...and 5 more sections

Figures (3)

Figure 1: EEGViT architecture, SOTA on EEGEYENETvit2eeg.
Figure 2: An outline of the TCNet functionality tcnet.
Figure 3: An outline of our addition to EEGViT, demonstrating our distinct feature extraction methodology.

Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

TL;DR

Abstract

Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

Authors

TL;DR

Abstract

Table of Contents

Figures (3)