Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

Chi-Sheng Chen; Chun-Shu Wei

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

Chi-Sheng Chen, Chun-Shu Wei

TL;DR

This work tackles zero-shot EEG-based image recognition by proposing MUSE, a self-supervised, multimodal framework that aligns EEG and image embeddings while preserving intra-batch similarities. It combines EEG encoders (STConv/NervFormer) with an off-the-shelf CLIP-ViT image encoder and introduces a similarity-keeping loss that regularizes cross-modal contrastive learning via a trainable parameter $\beta$, yielding the combined objective $\mathcal{L}_{SK-InfoNCE} = \mathcal{L}_{InfoNCE} + \beta \times \mathcal{L}_{SK}$. Empirical results on the THINGS EEG RSVP dataset show state-of-the-art zero-shot performance (top-1 $19.3\%$, top-5 $48.9\%$) and robust improvements across variants, with interpretability analyses (Grad-CAM) illuminating Occipital-Parietal dynamics in the $100$–$500$ ms window and associated alpha/gamma-band activity. These findings demonstrate that brain-inspired contrastive learning can effectively bridge temporal EEG signals and visual semantics, enabling more flexible, non-invasive brain–computer interface capabilities.

Abstract

Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining using an extensive visual EEG dataset. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification. Furthermore, we visualize neural patterns via model interpretation, shedding light on the visual processing dynamics in the human brain. The code repository for this work is available at: https://github.com/ChiShengChen/MUSE_EEG.

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

TL;DR

Abstract

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)