Table of Contents
Fetching ...

Brain-Gen: Towards Interpreting Neural Signals for Stimulus Reconstruction Using Transformers and Latent Diffusion Models

Hasib Aslam, Muhammad Talal Faiz, Muhammad Imran Malik

TL;DR

Brain-Gen tackles the interpretability of EEG signals for visual stimulus reconstruction by coupling a transformer-based spatio-temporal encoder with a diffusion-based image generator conditioned via cross-attention. The approach uses sliding-window EEG tokens and contrastive learning to sculpt semantically meaningful latent representations, which then guide Stable Diffusion-2 to reconstruct images with semantic fidelity. Empirical results on EEG-CVPR40 and Thought Viz show meaningful gains in clustering and zero-shot generalization, along with competitive image-generation metrics (IS and FID) relative to baselines. This work demonstrates a viable pathway for generalizable, semantically interpretable decoding of neural activity into visual stimuli using latent diffusion conditioning.

Abstract

Advances in neuroscience and artificial intelligence have enabled preliminary decoding of brain activity. However, despite the progress, the interpretability of neural representations remains limited. A significant challenge arises from the intrinsic properties of electroencephalography (EEG) signals, including high noise levels, spatial diffusion, and pronounced temporal variability. To interpret the neural mechanism underlying thoughts, we propose a transformers-based framework to extract spatial-temporal representations associated with observed visual stimuli from EEG recordings. These features are subsequently incorporated into the attention mechanisms of Latent Diffusion Models (LDMs) to facilitate the reconstruction of visual stimuli from brain activity. The quantitative evaluations on publicly available benchmark datasets demonstrate that the proposed method excels at modeling the semantic structures from EEG signals; achieving up to 6.5% increase in latent space clustering accuracy and 11.8% increase in zero shot generalization across unseen classes while having comparable Inception Score and Fréchet Inception Distance with existing baselines. Our work marks a significant step towards generalizable semantic interpretation of the EEG signals.

Brain-Gen: Towards Interpreting Neural Signals for Stimulus Reconstruction Using Transformers and Latent Diffusion Models

TL;DR

Brain-Gen tackles the interpretability of EEG signals for visual stimulus reconstruction by coupling a transformer-based spatio-temporal encoder with a diffusion-based image generator conditioned via cross-attention. The approach uses sliding-window EEG tokens and contrastive learning to sculpt semantically meaningful latent representations, which then guide Stable Diffusion-2 to reconstruct images with semantic fidelity. Empirical results on EEG-CVPR40 and Thought Viz show meaningful gains in clustering and zero-shot generalization, along with competitive image-generation metrics (IS and FID) relative to baselines. This work demonstrates a viable pathway for generalizable, semantically interpretable decoding of neural activity into visual stimuli using latent diffusion conditioning.

Abstract

Advances in neuroscience and artificial intelligence have enabled preliminary decoding of brain activity. However, despite the progress, the interpretability of neural representations remains limited. A significant challenge arises from the intrinsic properties of electroencephalography (EEG) signals, including high noise levels, spatial diffusion, and pronounced temporal variability. To interpret the neural mechanism underlying thoughts, we propose a transformers-based framework to extract spatial-temporal representations associated with observed visual stimuli from EEG recordings. These features are subsequently incorporated into the attention mechanisms of Latent Diffusion Models (LDMs) to facilitate the reconstruction of visual stimuli from brain activity. The quantitative evaluations on publicly available benchmark datasets demonstrate that the proposed method excels at modeling the semantic structures from EEG signals; achieving up to 6.5% increase in latent space clustering accuracy and 11.8% increase in zero shot generalization across unseen classes while having comparable Inception Score and Fréchet Inception Distance with existing baselines. Our work marks a significant step towards generalizable semantic interpretation of the EEG signals.

Paper Structure

This paper contains 31 sections, 21 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Image generation results from EEG signals corresponding to first 10 classes of EEG-CVPR40 dataset \ref{['ds:eeg-cvpr40']} from test split. The first row illustrates ground-truth images shown to the subject with the latter rows consisting of reconstructions, using subject's EEG.
  • Figure 2: System Diagram: First we use sliding window segmentation to extract multiple, sub samples from an EEG signal. Each of the sub sample is then encoded independently by the spatio-temporal encoder. The resulting sequence of EEG representations is then used to condition the reverse diffusion process of the Stable Diffusion, resulting in semantically aligned reconstruction of the visual stimuli.
  • Figure 3: Visualization of features extracted by proposed encoder from unseen samples from ThoughtViz dataset in left column and EEG-CVPR40 in right column using TSNE plots.
  • Figure 4: Visualization of features extracted by the proposed encoder from samples of unseen classes from EEG-CVPR40 using t-SNE.
  • Figure 5: Effect of sequence length on performance of proposed encoder on EEG-CVPR40 dataset.
  • ...and 2 more figures