Table of Contents
Fetching ...

Single-word Auditory Attention Decoding Using Deep Learning Model

Nhan Duc Thanh Nguyen, Huy Phan, Kaare Mikkelsen, Preben Kidmose

TL;DR

A deep learning approach is presented, based on EEGNet, that is capable of exploiting cognitive-related spatiotemporal EEG features and achieving at least 58% accuracy on the most realistic competing paradigm for the unseen subjects.

Abstract

Identifying auditory attention by comparing auditory stimuli and corresponding brain responses, is known as auditory attention decoding (AAD). The majority of AAD algorithms utilize the so-called envelope entrainment mechanism, whereby auditory attention is identified by how the envelope of the auditory stream drives variation in the electroencephalography (EEG) signal. However, neural processing can also be decoded based on endogenous cognitive responses, in this case, neural responses evoked by attention to specific words in a speech stream. This approach is largely unexplored in the field of AAD but leads to a single-word auditory attention decoding problem in which an epoch of an EEG signal timed to a specific word is labeled as attended or unattended. This paper presents a deep learning approach, based on EEGNet, to address this challenge. We conducted a subject-independent evaluation on an event-based AAD dataset with three different paradigms: word category oddball, word category with competing speakers, and competing speech streams with targets. The results demonstrate that the adapted model is capable of exploiting cognitive-related spatiotemporal EEG features and achieving at least 58% accuracy on the most realistic competing paradigm for the unseen subjects. To our knowledge, this is the first study dealing with this problem.

Single-word Auditory Attention Decoding Using Deep Learning Model

TL;DR

A deep learning approach is presented, based on EEGNet, that is capable of exploiting cognitive-related spatiotemporal EEG features and achieving at least 58% accuracy on the most realistic competing paradigm for the unseen subjects.

Abstract

Identifying auditory attention by comparing auditory stimuli and corresponding brain responses, is known as auditory attention decoding (AAD). The majority of AAD algorithms utilize the so-called envelope entrainment mechanism, whereby auditory attention is identified by how the envelope of the auditory stream drives variation in the electroencephalography (EEG) signal. However, neural processing can also be decoded based on endogenous cognitive responses, in this case, neural responses evoked by attention to specific words in a speech stream. This approach is largely unexplored in the field of AAD but leads to a single-word auditory attention decoding problem in which an epoch of an EEG signal timed to a specific word is labeled as attended or unattended. This paper presents a deep learning approach, based on EEGNet, to address this challenge. We conducted a subject-independent evaluation on an event-based AAD dataset with three different paradigms: word category oddball, word category with competing speakers, and competing speech streams with targets. The results demonstrate that the adapted model is capable of exploiting cognitive-related spatiotemporal EEG features and achieving at least 58% accuracy on the most realistic competing paradigm for the unseen subjects. To our knowledge, this is the first study dealing with this problem.

Paper Structure

This paper contains 12 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Visual overview of the data augmentation methods. 1 The original data after pre-processing and epoching. 2 Data of each paradigm is up-sampled by averaging 1 to 3 (duplicating when $k = 1$) random epochs within a class to create an upsampled experimental dataset. 3 Compute an individual average difference ERP waveform for each paradigm and vary the ERP waveforms by scaling amplitude, width, and shifting latency. 4 Simulate new attended epochs for each paradigm by adding each unattended epoch and a varied ERP waveform. 5 Create an augmented dataset including one upsampled experimental dataset and three simulated datasets.
  • Figure 2: 8-fold average classification performance of models of EEGNet trained on the original dataset (paradigm-independent model without augmentation), the augmented dataset (paradigm-independent model with augmentation), the augmented data of each paradigm (paradigm-specific model with augmentation), and the envelope-based linear model (only for Paradigm 3). The error bar represents the standard deviation. 'Prdm. 1', 'Prdm. 2' and 'Prdm. 3' indicate the model performances on the non-augmented data of the test set of Paradigms 1, 2, and 3, respectively. The horizontal bars show the statistical test of corresponding comparisons ( $\filledstar\filledstar$: $p < 0.001$, $\filledstar$: $0.001 \le p \le 0.05$, $\bullet$: $p > 0.05$). Reported accuracy is weighted accuracy with equal weights across the two classes. (*) The accuracy of 0.5 is the chance level.
  • Figure 3: LOSO classification performance of models of EEGNet, and the envelope-based linear model (only for Paradigm 3).