Table of Contents
Fetching ...

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Haoyang Qin, Quanying Liu

TL;DR

The paper addresses the challenge of decoding natural visual perception from EEG signals for brain-computer interfaces. It introduces Adaptive Thinking Mapper (ATM), a channel-wise Transformer-based EEG encoder, and a two-stage EEG-guided image generation pipeline that first maps EEG embeddings to CLIP image embeddings via a prior diffusion model and then synthesizes images with SDXL/IP-Adapter guidance. The approach achieves state-of-the-art performance in EEG-based zero-shot classification, retrieval, and reconstruction on THINGS-EEG and THINGS-MEG, with comprehensive temporal and spatial analyses and MEG compatibility. The work demonstrates that portable EEG, combined with diffusion-based priors and large pre-trained vision models, can approach fMRI-level capabilities for rapid, low-cost visual decoding with broad BCI implications.

Abstract

How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of visual decoding and reconstruction based on functional Magnetic Resonance Imaging (fMRI). However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for visual decoding based on electroencephalography (EEG). In this study, we present an end-to-end EEG-based visual reconstruction zero-shot framework, consisting of a tailored brain encoder, called the Adaptive Thinking Mapper (ATM), which projects neural signals from different sources into the shared subspace as the clip embedding, and a two-stage multi-pipe EEG-to-image generation strategy. In stage one, EEG is embedded to align the high-level clip embedding, and then the prior diffusion model refines EEG embedding into image priors. A blurry image also decoded from EEG for maintaining the low-level feature. In stage two, we input both the high-level clip embedding, the blurry image and caption from EEG latent to a pre-trained diffusion model. Furthermore, we analyzed the impacts of different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. The experimental results indicate that our EEG-based visual zero-shot framework achieves SOTA performance in classification, retrieval and reconstruction, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. Our code is available at https://github.com/ncclab-sustech/EEG_Image_decode.

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

TL;DR

The paper addresses the challenge of decoding natural visual perception from EEG signals for brain-computer interfaces. It introduces Adaptive Thinking Mapper (ATM), a channel-wise Transformer-based EEG encoder, and a two-stage EEG-guided image generation pipeline that first maps EEG embeddings to CLIP image embeddings via a prior diffusion model and then synthesizes images with SDXL/IP-Adapter guidance. The approach achieves state-of-the-art performance in EEG-based zero-shot classification, retrieval, and reconstruction on THINGS-EEG and THINGS-MEG, with comprehensive temporal and spatial analyses and MEG compatibility. The work demonstrates that portable EEG, combined with diffusion-based priors and large pre-trained vision models, can approach fMRI-level capabilities for rapid, low-cost visual decoding with broad BCI implications.

Abstract

How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of visual decoding and reconstruction based on functional Magnetic Resonance Imaging (fMRI). However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for visual decoding based on electroencephalography (EEG). In this study, we present an end-to-end EEG-based visual reconstruction zero-shot framework, consisting of a tailored brain encoder, called the Adaptive Thinking Mapper (ATM), which projects neural signals from different sources into the shared subspace as the clip embedding, and a two-stage multi-pipe EEG-to-image generation strategy. In stage one, EEG is embedded to align the high-level clip embedding, and then the prior diffusion model refines EEG embedding into image priors. A blurry image also decoded from EEG for maintaining the low-level feature. In stage two, we input both the high-level clip embedding, the blurry image and caption from EEG latent to a pre-trained diffusion model. Furthermore, we analyzed the impacts of different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. The experimental results indicate that our EEG-based visual zero-shot framework achieves SOTA performance in classification, retrieval and reconstruction, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. Our code is available at https://github.com/ncclab-sustech/EEG_Image_decode.
Paper Structure (48 sections, 5 equations, 32 figures, 8 tables)

This paper contains 48 sections, 5 equations, 32 figures, 8 tables.

Figures (32)

  • Figure 1: EEG/MEG-based zero-shot brain decoding and reconstruction. Left: Overview of three visual decoding tasks using EEG/MEG data under natural image stimulus. Right: Our reconstruction examples.
  • Figure 2: EEG/MEG-based visual decoding and generation framework. The EEG encoder is designed as a flexible replacement component. After aligning with image features, the EEG features are used for zero-shot retrieval and classification tasks, and the reconstructed images are obtained through a two-stage generator.
  • Figure 3: The structure of ATM. The original EEG sequences of different variates are independently embedded into tokens. Channel-wise attention is applied to embedded variate tokens with enhanced interpretability revealing electrode correlations. And representations of each token are extracted by the shared feedforward network (FFN). Then Temporal-Spatial convolution can prevent overfitting and enhance the ability of Temporal-Spatial modeling.
  • Figure 4: EEG/MEG-based decoding and reconstruction performance. Left: Comparisons of nine encoders on the THINGS-EEG dataset, including within-subject and cross-subject performance. Right: Comparisons on the THINGS-MEG dataset, similar to left. Our method achieves the highest performance compared to other competing encoders in EEG/MEG-based visual decoding tasks.
  • Figure 5: EEG-based image retrieval and classification. (a) The paradigm of EEG-based image retrieval and classification. (b) Samples of the top-5 accuracy in EEG-image retrieval tasks. See Appendix \ref{['sec:Additional_images_results']} for additional images results. (c) Average in-subject classification accuracy across different methods. (d) Average in-subject retrieval accuracy across different methods.
  • ...and 27 more figures