Table of Contents
Fetching ...

A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond

Shreya Shukla, Jose Torres, Akshaj Murhekar, Christina Liu, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury

TL;DR

The paper surveys non-invasive EEG-driven generative AI for cross-modal outputs in image, text, and audio, detailing datasets, model families, EEG feature encoding, and evaluation benchmarks. It highlights key methodological trends (encoder–decoder frameworks, latent diffusion with semantic mediation, contrastive learning, and LLM-guided text generation) while noting core challenges such as small, heterogeneous datasets and limited cross-subject generalization. It emphasizes open baselines and datasets to support reproducible benchmarking and urges standardized evaluation and ethical considerations as the field advances. By integrating neuroscience-informed perspectives with cutting-edge generative techniques, the survey provides a structured roadmap for expanding EEG-based neural decoding toward practical, multi-modal BCIs.

Abstract

Decoding neural activity into human-interpretable representations is a key research direction in brain-computer interfaces (BCIs) and computational neuroscience. Recent progress in machine learning and generative AI has driven growing interest in transforming non-invasive Electroencephalography (EEG) signals into images, text, and audio. This survey consolidates and analyzes developments across EEG-to-image synthesis, EEG-to-text generation, and EEG-to-audio reconstruction. We conducted a structured literature search across major databases (2017-2025), extracting key information on datasets, generative architectures (GANs, VAEs, transformers, diffusion models), EEG feature-encoding techniques, evaluation metrics, and the major challenges shaping current work in this area. Our review finds that EEG-to-image models predominantly employ encoder-decoder architectures built on GANs, VAEs, or diffusion models; EEG-to-text approaches increasingly leverage transformer-based language models for open-vocabulary decoding; and EEG-to-audio methods commonly map EEG signals to mel-spectrograms that are subsequently rendered into audio using neural vocoders. Despite promising advances, the field remains constrained by small and heterogeneous datasets, limited cross-subject generalization, and the absence of standardized benchmarks. By consolidating methodological trends and available datasets, this survey provides a foundational reference for advancing EEG-based generative AI and supporting reproducible research. We further highlight open-source datasets and baseline implementations to facilitate systematic benchmarking and accelerate progress in EEG-driven neural decoding.

A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond

TL;DR

The paper surveys non-invasive EEG-driven generative AI for cross-modal outputs in image, text, and audio, detailing datasets, model families, EEG feature encoding, and evaluation benchmarks. It highlights key methodological trends (encoder–decoder frameworks, latent diffusion with semantic mediation, contrastive learning, and LLM-guided text generation) while noting core challenges such as small, heterogeneous datasets and limited cross-subject generalization. It emphasizes open baselines and datasets to support reproducible benchmarking and urges standardized evaluation and ethical considerations as the field advances. By integrating neuroscience-informed perspectives with cutting-edge generative techniques, the survey provides a structured roadmap for expanding EEG-based neural decoding toward practical, multi-modal BCIs.

Abstract

Decoding neural activity into human-interpretable representations is a key research direction in brain-computer interfaces (BCIs) and computational neuroscience. Recent progress in machine learning and generative AI has driven growing interest in transforming non-invasive Electroencephalography (EEG) signals into images, text, and audio. This survey consolidates and analyzes developments across EEG-to-image synthesis, EEG-to-text generation, and EEG-to-audio reconstruction. We conducted a structured literature search across major databases (2017-2025), extracting key information on datasets, generative architectures (GANs, VAEs, transformers, diffusion models), EEG feature-encoding techniques, evaluation metrics, and the major challenges shaping current work in this area. Our review finds that EEG-to-image models predominantly employ encoder-decoder architectures built on GANs, VAEs, or diffusion models; EEG-to-text approaches increasingly leverage transformer-based language models for open-vocabulary decoding; and EEG-to-audio methods commonly map EEG signals to mel-spectrograms that are subsequently rendered into audio using neural vocoders. Despite promising advances, the field remains constrained by small and heterogeneous datasets, limited cross-subject generalization, and the absence of standardized benchmarks. By consolidating methodological trends and available datasets, this survey provides a foundational reference for advancing EEG-based generative AI and supporting reproducible research. We further highlight open-source datasets and baseline implementations to facilitate systematic benchmarking and accelerate progress in EEG-driven neural decoding.

Paper Structure

This paper contains 32 sections, 1 equation, 15 figures, 3 tables.

Figures (15)

  • Figure 1: (a) Diagram of the primary lobes of cerebral cortex, including frontal, parietal, temporal, and occipital, highlighting their anatomical boundaries.(b) EEG Recording: Illustration of EEG activity recorded while participants view text stimuli, showing eye-gaze position and a 2-dimensional representation of the corresponding EEG signals. (c) Mapping between EEG channels and brain cortices: On the left is the visualization of each electrode in the 128-Channel EEG electrode placement mapped to a specific cortical region, with mappings shown across frontal, central, temporal, parietal, and occipital cortices. On the right is the neural activation visualization taken from the top of the scalp palazzo2020decoding.
  • Figure 2: General Steps from EEG Data Gathering to Stimuli Reconstruction (Image, Text, or Audio)
  • Figure 3: Overview of the literature search, selection, and synthesis framework for this topical review. The upper tiers summarize the search strategy and inclusion criteria for EEG-based generative modeling studies. The synthesis and organization tier maps to modality-specific sections (EEG-to-image, EEG-to-text, and EEG-to-audio/speech generation), each analyzed through dimensions such as neural basis, generative architectures, challenges tackled by studies, and evaluation methods. The cross-domain synthesis and outlook section integrates future research directions and suggested baseline implementations for practitioners.
  • Figure 4: Traditional CNN-based encoder-decoder architecture. The encoder processes a high-dimensional input image and generates a lower-dimensional latent representation capturing the most important features of the data. The decoder then reconstructs the image from this latent representation. Model parameters are updated by minimizing the reconstruction loss between the original and reconstructed images.
  • Figure 5: Structure of a Generative Adversarial Network (GAN). The model is trained through an adversarial process, where the generator creates fake images to fool the discriminator, while the discriminator learns to distinguish between real and generated images.
  • ...and 10 more figures