EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models
Igor Abramov, Ilya Makarov
TL;DR
We address EEG-based image reconstruction by introducing a dual-conditioning framework that fuses EEG embeddings with spatial saliency maps. The method uses Adaptive Thinking Mapper embeddings with LoRA-fine-tuned Stable Diffusion 2.1 and a ControlNet branch conditioned on predicted saliency maps to steer image generation. Experiments on THINGS-EEG show substantial improvements in both pixel-level and semantic fidelity, with stronger alignment to human visual attention compared with EEG-only baselines. The approach demonstrates the viability of efficiently adapting pre-trained diffusion models for neural decoding, with potential applications in medical diagnostics and neuroadaptive interfaces.
Abstract
Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence. To address this, we propose a dual-conditioning framework that combines EEG embeddings with spatial saliency maps to enhance image generation. Our approach leverages the Adaptive Thinking Mapper (ATM) for EEG feature extraction and fine-tunes Stable Diffusion 2.1 via Low-Rank Adaptation (LoRA) to align neural signals with visual semantics, while a ControlNet branch conditions generation on saliency maps for spatial control. Evaluated on THINGS-EEG, our method achieves a significant improvement in the quality of low- and high-level image features over existing approaches. Simultaneously, strongly aligning with human visual attention. The results demonstrate that attentional priors resolve EEG ambiguities, enabling high-fidelity reconstructions with applications in medical diagnostics and neuroadaptive interfaces, advancing neural decoding through efficient adaptation of pre-trained diffusion models.
