Table of Contents
Fetching ...

Feature Estimation of Global Language Processing in EEG Using Attention Maps

Dai Shimizu, Ko Watanabe, Andreas Dengel

TL;DR

This study tackles the challenge of estimating task-dependent EEG features during language processing with high temporal resolution but limited spatial detail. It leverages attention maps from Vision Transformers and Grad-CAM applied to EEGNet to extract interpretable, task-related features from EEG data, focusing on listening and speaking in a subject-independent framework. Using the OpenNEURO Spanish dataset with $1-40$ Hz EEG signals, Mel-spectrogram inputs, and leave-one-subject-out validation, it demonstrates that EEGNet achieves the highest classification accuracy while ViTs reveal distinct time-frequency attention patterns, including early ERP-associated dynamics. The findings validate a data-driven, model-weight-based approach for EEG feature estimation, offering insights for improved biomarkers, BCIs, and neurodiagnostics in cognitive neuroscience and clinical contexts.

Abstract

Understanding the correlation between EEG features and cognitive tasks is crucial for elucidating brain function. Brain activity synchronizes during speaking and listening tasks. However, it is challenging to estimate task-dependent brain activity characteristics with methods with low spatial resolution but high temporal resolution, such as EEG, rather than methods with high spatial resolution, like fMRI. This study introduces a novel approach to EEG feature estimation that utilizes the weights of deep learning models to explore this association. We demonstrate that attention maps generated from Vision Transformers and EEGNet effectively identify features that align with findings from prior studies. EEGNet emerged as the most accurate model regarding subject independence and the classification of Listening and Speaking tasks. The application of Mel-Spectrogram with ViTs enhances the resolution of temporal and frequency-related EEG characteristics. Our findings reveal that the characteristics discerned through attention maps vary significantly based on the input data, allowing for tailored feature extraction from EEG signals. By estimating features, our study reinforces known attributes and predicts new ones, potentially offering fresh perspectives in utilizing EEG for medical purposes, such as early disease detection. These techniques will make substantial contributions to cognitive neuroscience.

Feature Estimation of Global Language Processing in EEG Using Attention Maps

TL;DR

This study tackles the challenge of estimating task-dependent EEG features during language processing with high temporal resolution but limited spatial detail. It leverages attention maps from Vision Transformers and Grad-CAM applied to EEGNet to extract interpretable, task-related features from EEG data, focusing on listening and speaking in a subject-independent framework. Using the OpenNEURO Spanish dataset with Hz EEG signals, Mel-spectrogram inputs, and leave-one-subject-out validation, it demonstrates that EEGNet achieves the highest classification accuracy while ViTs reveal distinct time-frequency attention patterns, including early ERP-associated dynamics. The findings validate a data-driven, model-weight-based approach for EEG feature estimation, offering insights for improved biomarkers, BCIs, and neurodiagnostics in cognitive neuroscience and clinical contexts.

Abstract

Understanding the correlation between EEG features and cognitive tasks is crucial for elucidating brain function. Brain activity synchronizes during speaking and listening tasks. However, it is challenging to estimate task-dependent brain activity characteristics with methods with low spatial resolution but high temporal resolution, such as EEG, rather than methods with high spatial resolution, like fMRI. This study introduces a novel approach to EEG feature estimation that utilizes the weights of deep learning models to explore this association. We demonstrate that attention maps generated from Vision Transformers and EEGNet effectively identify features that align with findings from prior studies. EEGNet emerged as the most accurate model regarding subject independence and the classification of Listening and Speaking tasks. The application of Mel-Spectrogram with ViTs enhances the resolution of temporal and frequency-related EEG characteristics. Our findings reveal that the characteristics discerned through attention maps vary significantly based on the input data, allowing for tailored feature extraction from EEG signals. By estimating features, our study reinforces known attributes and predicts new ones, potentially offering fresh perspectives in utilizing EEG for medical purposes, such as early disease detection. These techniques will make substantial contributions to cognitive neuroscience.
Paper Structure (16 sections, 7 figures, 2 tables)

This paper contains 16 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Experimental protocol of the dataset. Subjects listen to and then repeat one of 30 randomly selected Spanish sentences, forming 30 perception-production pairs. Each sentence lasts approximately two seconds. Subjects perform between 360 and 420 trials, with each figure representing one trial.
  • Figure 2: Electrode placement following the 64-channel international 10–20 system. Electrodes framed in red were used.
  • Figure 3: Attention Maps of the models during classification. Lower values (indicated by blue) represent regions where the models allocate less attention, whereas higher values, indicated by red, signify areas of focused attention. The x-axis represents the time series from 0 to 4 seconds, and the y-axis represents the frequency series from 0 to 40 Hz. Normalized attention maps, averaged from data collected during (a) the perception task (listening) using the Custom ViT, (b) the production task (speaking) using the Custom ViT, (c) the perception task using the pre-trained ViT, and (d) the production task using the pre-trained ViT.
  • Figure 4: Contrasts of attention maps between production and perception tasks. Lower values (blue) indicate greater attention during production tasks, whereas higher values (red) highlight areas of intensified focus during perception tasks. Both axes are consistent with those in Figure 3. Normalized attention maps are obtained by calculating the difference between the Perception and Production attention maps from (a) the Custom ViT and (b) the pre-trained ViT.
  • Figure 5: The features extracted from the final layer of EEGNet using Grad-CAM.
  • ...and 2 more figures