Linear stimulus reconstruction works on the KU Leuven audiovisual, gaze-controlled auditory attention decoding dataset

Simon Geirnaert; Iustina Rotaru; Tom Francart; Alexander Bertrand

Linear stimulus reconstruction works on the KU Leuven audiovisual, gaze-controlled auditory attention decoding dataset

Simon Geirnaert, Iustina Rotaru, Tom Francart, Alexander Bertrand

TL;DR

This work evaluates linear stimulus reconstruction as a baseline for auditory attention decoding on the AV-GC-AAD dataset, addressing gaze-related confounds in Sp-AAD approaches. The attended speech envelope is reconstructed from EEG via a spatio-temporal backward model, with the decoder trained by minimizing $\|\mathbf{s}_a - \mathbf{X} \mathbf{d}\|_2^2$ and regularized using Ledoit-Wolf; decisions are made by comparing Pearson correlations to the competing envelopes. Results show significant AAD accuracy within each condition and strong generalization across conditions, across new subjects, and even across datasets, supporting the claim that the AV-GC-AAD data are decodable with simple linear models. The authors provide a reproducible baseline procedure and code to benchmark future AAD algorithms on this challenging dataset.

Abstract

In a recent paper, we presented the KU Leuven audiovisual, gaze-controlled auditory attention decoding (AV-GC-AAD) dataset, in which we recorded electroencephalography (EEG) signals of participants attending to one out of two competing speakers under various audiovisual conditions. The main goal of this dataset was to disentangle the direction of gaze from the direction of auditory attention, in order to reveal gaze-related shortcuts in existing spatial AAD algorithms that aim to decode the (direction of) auditory attention directly from the EEG. Various methods based on spatial AAD do not achieve significant above-chance performances on our AV-GC-AAD dataset, indicating that previously reported results were mainly driven by eye gaze confounds in existing datasets. Still, these adverse outcomes are often discarded for reasons that are attributed to the limitations of the AV-GC-AAD dataset, such as the limited amount of data to train a working model, too much data heterogeneity due to different audiovisual conditions, or participants allegedly being unable to focus their auditory attention under the complex instructions. In this paper, we present the results of the linear stimulus reconstruction AAD algorithm and show that high AAD accuracy can be obtained within each individual condition and that the model generalizes across conditions, across new subjects, and even across datasets. Therefore, we eliminate any doubts that the inadequacy of the AV-GC-AAD dataset is the primary reason for the (spatial) AAD algorithms failing to achieve above-chance performance when compared to other datasets. Furthermore, this report provides a simple baseline evaluation procedure (including source code) that can serve as the minimal benchmark for all future AAD algorithms evaluated on this dataset.

Linear stimulus reconstruction works on the KU Leuven audiovisual, gaze-controlled auditory attention decoding dataset

TL;DR

Abstract

Linear stimulus reconstruction works on the KU Leuven audiovisual, gaze-controlled auditory attention decoding dataset

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)