Optimising EEG decoding with refined sampling and multimodal feature integration

Arash Akbarinia

Optimising EEG decoding with refined sampling and multimodal feature integration

Arash Akbarinia

TL;DR

This work tackles EEG-based object decoding by aligning EEG encoder outputs with multimodal pretrained features through contrastive learning. It introduces InterDimensional EEG Sampling (IDES) to expand the training space and boost SNR, and couples visual features with language features derived from BLIP captions to form a richer multimodal target for EEG alignment. Evaluated on the THINGS EEG2 dataset, the approach achieves substantial gains over state-of-the-art baselines, with ~7% higher Top-1 accuracy in intraparticipant settings and strong generalization signals, notably when using Laion-400M CLIP features and observing a correlation with ImageNet-O/A generalization power. The findings suggest that refined sampling and multimodal feature integration can meaningfully enhance EEG decoding and potentially generalize to other neuroimaging modalities, while remaining mindful of computational costs and broader societal implications.

Abstract

Electroencephalography (EEG) is a neuroimaging technique that records brain neural activity with high temporal resolution. Unlike other methods, EEG does not require prohibitively expensive equipment and can be easily set up using commercially available portable EEG caps, making it an ideal candidate for brain-computer interfaces. However, EEG signals are characterised by poor spatial resolution and high noise levels, complicating their decoding. In this study, we employ a contrastive learning framework to align encoded EEG features with pretrained CLIP features, achieving a 7% improvement over the state-of-the-art in EEG decoding of object categories. This enhancement is equally attributed to (1) a novel online sampling method that boosts the signal-to-noise ratio and (2) multimodal representations leveraging visual and language features to enhance the alignment space. Our analysis reveals a systematic interaction between the architecture and dataset of pretrained features and their alignment efficacy for EEG signal decoding. This interaction correlates with the generalisation power of the pretrained features on ImageNet-O/A datasets ($r=.5$). These findings extend beyond EEG signal alignment, offering potential for broader applications in neuroimaging decoding and generic feature alignments.

Optimising EEG decoding with refined sampling and multimodal feature integration

TL;DR

Abstract

). These findings extend beyond EEG signal alignment, offering potential for broader applications in neuroimaging decoding and generic feature alignments.

Paper Structure (17 sections, 6 figures)

This paper contains 17 sections, 6 figures.

Introduction
Method
Refined sampling
Multimodal EEG decoding framework
Pretrained features
EEG Encoder
Experiments
THINGS EEG2 Dataset
Training and testing
Baseline comparison
EEG sampling effect
Multimodal effect
Pretrained effect
Discussion
Computational cost
...and 2 more sections

Figures (6)

Figure 1: The schematic flowchart illustrates the proposed framework for decoding EEG signals by aligning them with multimodal pretrained features. During training, the EEG signals are sampled across images and repeat dimensions to expand the EEG training space. At test time, EEG signals are averaged across all repeats. An EEG encoder extracts EEG features, which are then aligned with features extracted from pretrained networks using contrastive learning. These pretrained features are obtained by concatenating text and image feature vectors. Text features are generated by feeding images into an image captioning network to produce descriptive texts, which are subsequently processed by a text encoder. Image features are directly extracted by an image encoder.
Figure 2: This table presents the out-of-distribution (OOD) classification accuracy across 200 test concepts from the THINGS EEG2 Dataset gifford2022large. Each cell is colour-coded from worst (red) to best (green) to visualise performance. The proposed model is highlighted in yellow. Results are shown for individual participants (S01 to S10), and AVG represents the average accuracy across all ten participants. Intraparticipant refers to training and testing an EEG encoder on data from the same participant. Interparticipant refers to training on data from nine participants and testing on the left-out participant.
Figure 3: This table breaks down the performance gain of the proposed framework by comparing the intraparticipant classification accuracy across several settings. Each cell is colour-coded from worst (red) to best (green) to visualise performance. The baselines used are NICE and ATM, which sample EEG signals by averaging across repeats and aligning EEG features to pretrained visual features. Each baseline is improved through two methods: (1) expanding the EEG space using interdimensional sampling, and (2) aligning EEG features to multimodal visual-language features.
Figure 4: This figure demonstrates the effect of EEG sampling by comparing the classification accuracy of pairs of networks that are identical in all aspects (e.g., architecture, pretrained networks and features) except for their EEG sampling method. Each data point represents the Top-1 accuracy for individual participants, shown as grey-filled circles. Bold black crosses indicate the average accuracy across participants for a given test setting. The dashed green line represents the identity line; points above this diagonal line indicate a performance gain by using interdimensional sampling. The number of data points, p-values, and their significance are indicated in the legend.
Figure 5: This figure demonstrates the effect of multimodal feature alignment by comparing the classification accuracy of pairs of networks that are identical in all aspects (e.g., architecture, sampling, and pretrained network) except for their pretrained features. Each data point represents the Top-1 accuracy for individual participants, shown as grey-filled circles. Bold black crosses indicate the average accuracy across participants for a given test setting. The dashed green line represents the identity line; points above this diagonal line indicate a performance gain by using multimodal (visual-language) pretrained features. The number of data points, p-values, and their significance are indicated in the legend.
...and 1 more figures

Optimising EEG decoding with refined sampling and multimodal feature integration

TL;DR

Abstract

Optimising EEG decoding with refined sampling and multimodal feature integration

Authors

TL;DR

Abstract

Table of Contents

Figures (6)