Table of Contents
Fetching ...

Effect of Kernel Size on CNN-Vision-Transformer-Based Gaze Prediction Using Electroencephalography Data

Chuhui Qiu, Bugao Liang, Matthew L Key

TL;DR

This work investigates how kernel size in CNN–vision transformer hybrids affects EEG-based gaze prediction using the EEGEyeNet dataset. By employing a two-stage front-end with a full-channel depth-wise spatial convolution and a ViT backbone, the method achieves better accuracy than the current SOTA EEGViT while reducing training time. The approach demonstrates that learning across all EEG channels with a large spatial kernel yields robust spatial relationships, though real-world deployment remains challenging due to persisting accuracy and speed gaps relative to video-based eye-tracking. The results underscore the potential of CNN–transformer hybrids with broad channel receptive fields for EEG-based gaze estimation and point to future work on richer datasets and real-world applicability.

Abstract

In this paper, we present an algorithm of gaze prediction from Electroencephalography (EEG) data. EEG-based gaze prediction is a new research topic that can serve as an alternative to traditional video-based eye-tracking. Compared to the existing state-of-the-art (SOTA) method, we improved the root mean-squared-error of EEG-based gaze prediction to 53.06 millimeters, while reducing the training time to less than 33% of its original duration. Our source code can be found at https://github.com/AmCh-Q/CSCI6907Project

Effect of Kernel Size on CNN-Vision-Transformer-Based Gaze Prediction Using Electroencephalography Data

TL;DR

This work investigates how kernel size in CNN–vision transformer hybrids affects EEG-based gaze prediction using the EEGEyeNet dataset. By employing a two-stage front-end with a full-channel depth-wise spatial convolution and a ViT backbone, the method achieves better accuracy than the current SOTA EEGViT while reducing training time. The approach demonstrates that learning across all EEG channels with a large spatial kernel yields robust spatial relationships, though real-world deployment remains challenging due to persisting accuracy and speed gaps relative to video-based eye-tracking. The results underscore the potential of CNN–transformer hybrids with broad channel receptive fields for EEG-based gaze estimation and point to future work on richer datasets and real-world applicability.

Abstract

In this paper, we present an algorithm of gaze prediction from Electroencephalography (EEG) data. EEG-based gaze prediction is a new research topic that can serve as an alternative to traditional video-based eye-tracking. Compared to the existing state-of-the-art (SOTA) method, we improved the root mean-squared-error of EEG-based gaze prediction to 53.06 millimeters, while reducing the training time to less than 33% of its original duration. Our source code can be found at https://github.com/AmCh-Q/CSCI6907Project
Paper Structure (16 sections, 2 equations, 7 figures, 3 tables)

This paper contains 16 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Electrode Layout of the 128-channel EEG Geodesic Hydrocel system bamatraf2016system
  • Figure 2: The Large Grid Paradigm of EEGEyeNet kastrati2021eegeyenet
  • Figure 3: Distribution of the Fixation Positions in the Large Grid Paradigm kastrati2021eegeyenet
  • Figure 4: EEGViT Model Architecture yang2023vit2eeg
  • Figure 5: Our Model Architecture, modified from yang2023vit2eeg
  • ...and 2 more figures