Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

Zelin Qiu; Jianjun Gu; Dingding Yao; Junfeng Li

Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li

TL;DR

This work tackles the variability caused by trial-specific EEG fingerprints in spatial auditory attention decoding (Sp-AAD) by introducing Prototype Training, a neuroscience-inspired method that creates prototypes from multiple trials to emphasize energy-distribution features. Paired with EEGWaveNet, a wavelet-transformed time-frequency decoder, the approach improves cross-trial generalization and provides comprehensive benchmarking across three datasets and multiple data-partitioning schemes. Key findings show that prototype training yields gains in cross-trial scenarios (notably with $K$ around 25) and that time-frequency energy representations better capture auditory attention features than time-domain signals. The proposed framework offers a practical, architecture-agnostic training paradigm that reduces trial-specific bias and provides a rich benchmarking resource for Sp-AAD research.

Abstract

The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial auditory attention can be reflected in the topological distribution of EEG energy across different frequency bands. This insight motivates us to propose Prototype Training, a neuroscience-inspired method for Sp-AAD. This method constructs prototypes with enhanced energy distribution representations and reduced trial-specific characteristics, enabling the model to better capture auditory attention features. To implement prototype training, an EEGWaveNet that employs the wavelet transform of EEG is further proposed. Detailed experiments indicate that the EEGWaveNet with prototype training outperforms other competitive models on various datasets, and the effectiveness of the proposed method is also validated. As a training method independent of model architecture, prototype training offers new insights into the field of Sp-AAD.

Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

TL;DR

around 25) and that time-frequency energy representations better capture auditory attention features than time-domain signals. The proposed framework offers a practical, architecture-agnostic training paradigm that reduces trial-specific bias and provides a rich benchmarking resource for Sp-AAD research.

Abstract

Paper Structure (21 sections, 9 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 9 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Problem Formulation
Method
Prototype Training
Time-Frequency Representations
Decoder and Loss Function
Experimental Setups
Data Specifications
EEG Preprocessing
Contrastive Models
Data Partitioning Strategies
Training Details
Results and Analysis
Main Results
Effect of Window Length
...and 6 more sections

Figures (5)

Figure 1: An overview of the proposed method. (a) The pipeline of existing mainstream methods. (b) The pipeline of the proposed method, incorporating time-frequency transform and prototype generation processes. (c) The model architecture of the EEGWaveNet.
Figure 2: (a) A schematic diagram of data arrangement. (b) Schematic illustrations of three different data partitioning strategies under 4-fold cross-validation.
Figure 3: Performance of different models across varying window lengths. To clearly illustrate the datasets and data partitioning strategies used, nine charts are arranged in a matrix format. Horizontally, from left to right, they represent three data partitioning strategies: Strategy I, Strategy II, and Strategy III. Vertically, from top to bottom, they correspond to three datasets: Das-2016, Fuglsang-2018, and Fuglsang-2020. Each data point represents the average decoding accuracy of all subjects in the corresponding dataset.
Figure 4: Illustration of decoding results obtained with different values of parameter $K$, representing, from left to right, the results on the Das-2016, Fuglsang-2018, and Fuglsang-2020 datasets respectively.
Figure 5: T-SNE visualization of input decision windows and output embeddings. (a) Projection results of decision windows from four random trials. (b)-(c) Projection results of prototype samples generated with $K=10$ and $K=25$, respectively. (d)-(f) Projection results of high-dimensional embeddings produced by the trained EEGWaveNet with $K=1$, $K=10$, and $K=25$ using data from the the same four trials in the testset.

Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

TL;DR

Abstract

Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

Authors

TL;DR

Abstract

Table of Contents

Figures (5)