Table of Contents
Fetching ...

Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-Processing

Matthew L Key, Tural Mehtiyev, Xiaodong Qu

TL;DR

The paper addresses the challenge of accurately predicting gaze position from EEG data by leveraging a hybrid EEGViT architecture enhanced with depthwise separable convolutions and a clustering-based pre-processing pipeline. The proposed EEG-DCViT combines DS-CNNs with data clustering to improve feature extraction and label fidelity, yielding superior performance on the EEGEyeNet Absolute Position task. It achieves a new benchmark RMSE of $51.6 \pm 0.2$ mm, surpassing the prior $55.4 \pm 0.2$ mm, thus demonstrating the value of targeted pre-processing and architectural refinements for EEG-based gaze estimation. The work has practical implications for EEG-based brain-computer interfaces and emphasizes the importance of data-quality improvements and efficient neural architectures in neural decoding tasks.

Abstract

In the field of EEG-based gaze prediction, the application of deep learning to interpret complex neural data poses significant challenges. This study evaluates the effectiveness of pre-processing techniques and the effect of additional depthwise separable convolution on EEG vision transformers (ViTs) in a pretrained model architecture. We introduce a novel method, the EEG Deeper Clustered Vision Transformer (EEG-DCViT), which combines depthwise separable convolutional neural networks (CNNs) with vision transformers, enriched by a pre-processing strategy involving data clustering. The new approach demonstrates superior performance, establishing a new benchmark with a Root Mean Square Error (RMSE) of 51.6 mm. This achievement underscores the impact of pre-processing and model refinement in enhancing EEG-based applications.

Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-Processing

TL;DR

The paper addresses the challenge of accurately predicting gaze position from EEG data by leveraging a hybrid EEGViT architecture enhanced with depthwise separable convolutions and a clustering-based pre-processing pipeline. The proposed EEG-DCViT combines DS-CNNs with data clustering to improve feature extraction and label fidelity, yielding superior performance on the EEGEyeNet Absolute Position task. It achieves a new benchmark RMSE of mm, surpassing the prior mm, thus demonstrating the value of targeted pre-processing and architectural refinements for EEG-based gaze estimation. The work has practical implications for EEG-based brain-computer interfaces and emphasizes the importance of data-quality improvements and efficient neural architectures in neural decoding tasks.

Abstract

In the field of EEG-based gaze prediction, the application of deep learning to interpret complex neural data poses significant challenges. This study evaluates the effectiveness of pre-processing techniques and the effect of additional depthwise separable convolution on EEG vision transformers (ViTs) in a pretrained model architecture. We introduce a novel method, the EEG Deeper Clustered Vision Transformer (EEG-DCViT), which combines depthwise separable convolutional neural networks (CNNs) with vision transformers, enriched by a pre-processing strategy involving data clustering. The new approach demonstrates superior performance, establishing a new benchmark with a Root Mean Square Error (RMSE) of 51.6 mm. This achievement underscores the impact of pre-processing and model refinement in enhancing EEG-based applications.
Paper Structure (20 sections, 8 figures, 2 tables)

This paper contains 20 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Large Grid Experimental Setup: This image illustrates the schematic view of the experimental setup and the stimuli placement on the screen. It gives a visual representation of how participants interacted with the stimuli during the eye-tracking events kastrati2021eegeyenet.
  • Figure 2: Clustering illustrates the discrepancy between labeled positions and actual target positions.
  • Figure 3: The centroids used to correct training data labels.
  • Figure 4: EEG Vision Transformer with Depthwise Separable Convolution A specialized ViT structure tailored for raw EEG signal input. This architecture utilizes a quad-step convolution process to produce patch embeddings. The dotted outline highlights the depthwise separable convolution. After this initial step, positional embeddings are integrated and the combined sequence is subsequently passed through the ViT layers midterm-eeg-vit. The design of the positional embedding and ViT layer is adapted from dosovitskiy2021image.
  • Figure 5: Classification Performance Metrics by Cluster: This figure presents a detailed breakdown of classification metrics including precision, recall, F1-score, and support for 25 clusters, highlighting the performance of each cluster in the model evaluation.
  • ...and 3 more figures