Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models

Eleonora Lopez; Luigi Sigillo; Federica Colonnese; Massimo Panella; Danilo Comminiello

Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models

Eleonora Lopez, Luigi Sigillo, Federica Colonnese, Massimo Panella, Danilo Comminiello

TL;DR

Reconstructing visual stimuli from EEG for real-time BCIs is challenged by noisy, low-resolution signals. The paper introduces GWIT, a streamlined EEG-to-image pipeline that uses a ControlNet adapter to condition a frozen Latent Diffusion Model (LDM) on EEG inputs, with EEG mapped via $z_{eeg} = f_{proj}(oldsymbol{y})$ and combined with the image latent $z_{img}^t$ to form the conditioning $c_{eeg}$. A coarse caption $c_l$ is provided by a frozen EEG image decoder, and multi-subject support is achieved through a subject conditioning layer $S(oldsymbol{y}, s)$; training updates only the ControlNet adapter and the projection head, using the standard LDM loss $\\mathcal{L} = \mathbb{E} \, || \epsilon - \epsilon_\theta(\cdot) ||^2$. Evaluated on EEGCVPR40 and ThoughtViz, GWIT delivers state-of-the-art generation quality (FID/IS) and semantic accuracy (ACC), with single-subject ACC gains of $+85.71\%$ and multi-subject gains of $+7\%$, and reduced LPIPS, indicating closer alignment to ground-truth visuals. The approach requires minimal preprocessing and no pretraining, enabling more practical, near real-time BCI deployments while outperforming prior, more complex EEG-to-image pipelines.

Abstract

Generating images from brain waves is gaining increasing attention due to its potential to advance brain-computer interface (BCI) systems by understanding how brain signals encode visual cues. Most of the literature has focused on fMRI-to-Image tasks as fMRI is characterized by high spatial resolution. However, fMRI is an expensive neuroimaging modality and does not allow for real-time BCI. On the other hand, electroencephalography (EEG) is a low-cost, non-invasive, and portable neuroimaging technique, making it an attractive option for future real-time applications. Nevertheless, EEG presents inherent challenges due to its low spatial resolution and susceptibility to noise and artifacts, which makes generating images from EEG more difficult. In this paper, we address these problems with a streamlined framework based on the ControlNet adapter for conditioning a latent diffusion model (LDM) through EEG signals. We conduct experiments and ablation studies on popular benchmarks to demonstrate that the proposed method beats other state-of-the-art models. Unlike these methods, which often require extensive preprocessing, pretraining, different losses, and captioning models, our approach is efficient and straightforward, requiring only minimal preprocessing and a few components. The code is available at https://github.com/LuigiSigillo/GWIT.

Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models

TL;DR

and combined with the image latent

to form the conditioning

. A coarse caption

is provided by a frozen EEG image decoder, and multi-subject support is achieved through a subject conditioning layer

; training updates only the ControlNet adapter and the projection head, using the standard LDM loss

. Evaluated on EEGCVPR40 and ThoughtViz, GWIT delivers state-of-the-art generation quality (FID/IS) and semantic accuracy (ACC), with single-subject ACC gains of

and multi-subject gains of

, and reduced LPIPS, indicating closer alignment to ground-truth visuals. The approach requires minimal preprocessing and no pretraining, enabling more practical, near real-time BCI deployments while outperforming prior, more complex EEG-to-image pipelines.

Abstract

Paper Structure (5 sections, 2 equations, 4 figures, 3 tables)

This paper contains 5 sections, 2 equations, 4 figures, 3 tables.

Introduction
Related works
Proposed Method
Experiments
Conclusion

Figures (4)

Figure 1: Guess What I Think (GWIT). Outline of our streamlined framework which includes a projection function, a ControlNet adapter to handle EEG conditioning, a frozen LDM, and a frozen EEG image decoder to obtain a coarse-grained control.
Figure 2: Comparison of images generated with models trained on subject $4$ (top three rows) and on all subjects (bottom four rows) of EEGCVPR40.
Figure 3: Comparison for images generated by models trained on ThoughtViz. First row: random sample from the dataset.
Figure 4: Comparison of images generated with models trained only on subject $4$ of EEGCVPR40. Left: random sample from the dataset.

Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models

TL;DR

Abstract

Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)