Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-Processing
Matthew L Key, Tural Mehtiyev, Xiaodong Qu
TL;DR
The paper addresses the challenge of accurately predicting gaze position from EEG data by leveraging a hybrid EEGViT architecture enhanced with depthwise separable convolutions and a clustering-based pre-processing pipeline. The proposed EEG-DCViT combines DS-CNNs with data clustering to improve feature extraction and label fidelity, yielding superior performance on the EEGEyeNet Absolute Position task. It achieves a new benchmark RMSE of $51.6 \pm 0.2$ mm, surpassing the prior $55.4 \pm 0.2$ mm, thus demonstrating the value of targeted pre-processing and architectural refinements for EEG-based gaze estimation. The work has practical implications for EEG-based brain-computer interfaces and emphasizes the importance of data-quality improvements and efficient neural architectures in neural decoding tasks.
Abstract
In the field of EEG-based gaze prediction, the application of deep learning to interpret complex neural data poses significant challenges. This study evaluates the effectiveness of pre-processing techniques and the effect of additional depthwise separable convolution on EEG vision transformers (ViTs) in a pretrained model architecture. We introduce a novel method, the EEG Deeper Clustered Vision Transformer (EEG-DCViT), which combines depthwise separable convolutional neural networks (CNNs) with vision transformers, enriched by a pre-processing strategy involving data clustering. The new approach demonstrates superior performance, establishing a new benchmark with a Root Mean Square Error (RMSE) of 51.6 mm. This achievement underscores the impact of pre-processing and model refinement in enhancing EEG-based applications.
