Brain-Gen: Towards Interpreting Neural Signals for Stimulus Reconstruction Using Transformers and Latent Diffusion Models
Hasib Aslam, Muhammad Talal Faiz, Muhammad Imran Malik
TL;DR
Brain-Gen tackles the interpretability of EEG signals for visual stimulus reconstruction by coupling a transformer-based spatio-temporal encoder with a diffusion-based image generator conditioned via cross-attention. The approach uses sliding-window EEG tokens and contrastive learning to sculpt semantically meaningful latent representations, which then guide Stable Diffusion-2 to reconstruct images with semantic fidelity. Empirical results on EEG-CVPR40 and Thought Viz show meaningful gains in clustering and zero-shot generalization, along with competitive image-generation metrics (IS and FID) relative to baselines. This work demonstrates a viable pathway for generalizable, semantically interpretable decoding of neural activity into visual stimuli using latent diffusion conditioning.
Abstract
Advances in neuroscience and artificial intelligence have enabled preliminary decoding of brain activity. However, despite the progress, the interpretability of neural representations remains limited. A significant challenge arises from the intrinsic properties of electroencephalography (EEG) signals, including high noise levels, spatial diffusion, and pronounced temporal variability. To interpret the neural mechanism underlying thoughts, we propose a transformers-based framework to extract spatial-temporal representations associated with observed visual stimuli from EEG recordings. These features are subsequently incorporated into the attention mechanisms of Latent Diffusion Models (LDMs) to facilitate the reconstruction of visual stimuli from brain activity. The quantitative evaluations on publicly available benchmark datasets demonstrate that the proposed method excels at modeling the semantic structures from EEG signals; achieving up to 6.5% increase in latent space clustering accuracy and 11.8% increase in zero shot generalization across unseen classes while having comparable Inception Score and Fréchet Inception Distance with existing baselines. Our work marks a significant step towards generalizable semantic interpretation of the EEG signals.
