Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression
Eric Modesitt, Haicheng Yin, Williams Huang Wang, Brian Lu
TL;DR
This work addresses the need for fast and accurate EEG regression in Brain-Computer Interfaces by fusing pretrained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet). The EEGViT-TCNet architecture bridges TCNet's temporal feature extraction with ViT's attention-based processing through two bridging convolutional layers and 1D patch embeddings on a pretrained ViT, trained with a 70/30 train/validation split and RMSE as the primary metric. On EEGEyeNet's Absolute Position Task, the model achieves a RMSE of $51.8$ mm, a $6.5\%$ improvement over the $55.4$ mm EEGViT baseline, and a speedup of up to $4.32\times$, outperforming traditional methods. Ablation studies show the contributions of the bridging convolutions, TCNet dropout settings, and pretrained ViT, and the work points to future explorations in interpretability and scalability across diverse EEG datasets.
Abstract
The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of developing robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression. The core of this approach lies in harnessing the sequential data processing strengths of ViTs along with the superior feature extraction capabilities of TCNet, to significantly improve EEG analysis accuracy. In addition, we analyze the importance of how to construct optimal patches for the attention mechanism to analyze, balancing both speed and accuracy tradeoffs. Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8 on EEGEyeNet's Absolute Position Task, outperforming existing state-of-the-art models. Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster). This breakthrough not only sets a new benchmark in EEG regression analysis but also opens new avenues for future research in the integration of transformer architectures with specialized feature extraction methods for diverse EEG datasets.
