A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond
Shreya Shukla, Jose Torres, Akshaj Murhekar, Christina Liu, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury
TL;DR
The paper surveys non-invasive EEG-driven generative AI for cross-modal outputs in image, text, and audio, detailing datasets, model families, EEG feature encoding, and evaluation benchmarks. It highlights key methodological trends (encoder–decoder frameworks, latent diffusion with semantic mediation, contrastive learning, and LLM-guided text generation) while noting core challenges such as small, heterogeneous datasets and limited cross-subject generalization. It emphasizes open baselines and datasets to support reproducible benchmarking and urges standardized evaluation and ethical considerations as the field advances. By integrating neuroscience-informed perspectives with cutting-edge generative techniques, the survey provides a structured roadmap for expanding EEG-based neural decoding toward practical, multi-modal BCIs.
Abstract
Decoding neural activity into human-interpretable representations is a key research direction in brain-computer interfaces (BCIs) and computational neuroscience. Recent progress in machine learning and generative AI has driven growing interest in transforming non-invasive Electroencephalography (EEG) signals into images, text, and audio. This survey consolidates and analyzes developments across EEG-to-image synthesis, EEG-to-text generation, and EEG-to-audio reconstruction. We conducted a structured literature search across major databases (2017-2025), extracting key information on datasets, generative architectures (GANs, VAEs, transformers, diffusion models), EEG feature-encoding techniques, evaluation metrics, and the major challenges shaping current work in this area. Our review finds that EEG-to-image models predominantly employ encoder-decoder architectures built on GANs, VAEs, or diffusion models; EEG-to-text approaches increasingly leverage transformer-based language models for open-vocabulary decoding; and EEG-to-audio methods commonly map EEG signals to mel-spectrograms that are subsequently rendered into audio using neural vocoders. Despite promising advances, the field remains constrained by small and heterogeneous datasets, limited cross-subject generalization, and the absence of standardized benchmarks. By consolidating methodological trends and available datasets, this survey provides a foundational reference for advancing EEG-based generative AI and supporting reproducible research. We further highlight open-source datasets and baseline implementations to facilitate systematic benchmarking and accelerate progress in EEG-driven neural decoding.
