Table of Contents
Fetching ...

EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models

Igor Abramov, Ilya Makarov

TL;DR

We address EEG-based image reconstruction by introducing a dual-conditioning framework that fuses EEG embeddings with spatial saliency maps. The method uses Adaptive Thinking Mapper embeddings with LoRA-fine-tuned Stable Diffusion 2.1 and a ControlNet branch conditioned on predicted saliency maps to steer image generation. Experiments on THINGS-EEG show substantial improvements in both pixel-level and semantic fidelity, with stronger alignment to human visual attention compared with EEG-only baselines. The approach demonstrates the viability of efficiently adapting pre-trained diffusion models for neural decoding, with potential applications in medical diagnostics and neuroadaptive interfaces.

Abstract

Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence. To address this, we propose a dual-conditioning framework that combines EEG embeddings with spatial saliency maps to enhance image generation. Our approach leverages the Adaptive Thinking Mapper (ATM) for EEG feature extraction and fine-tunes Stable Diffusion 2.1 via Low-Rank Adaptation (LoRA) to align neural signals with visual semantics, while a ControlNet branch conditions generation on saliency maps for spatial control. Evaluated on THINGS-EEG, our method achieves a significant improvement in the quality of low- and high-level image features over existing approaches. Simultaneously, strongly aligning with human visual attention. The results demonstrate that attentional priors resolve EEG ambiguities, enabling high-fidelity reconstructions with applications in medical diagnostics and neuroadaptive interfaces, advancing neural decoding through efficient adaptation of pre-trained diffusion models.

EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models

TL;DR

We address EEG-based image reconstruction by introducing a dual-conditioning framework that fuses EEG embeddings with spatial saliency maps. The method uses Adaptive Thinking Mapper embeddings with LoRA-fine-tuned Stable Diffusion 2.1 and a ControlNet branch conditioned on predicted saliency maps to steer image generation. Experiments on THINGS-EEG show substantial improvements in both pixel-level and semantic fidelity, with stronger alignment to human visual attention compared with EEG-only baselines. The approach demonstrates the viability of efficiently adapting pre-trained diffusion models for neural decoding, with potential applications in medical diagnostics and neuroadaptive interfaces.

Abstract

Existing EEG-driven image reconstruction methods often overlook spatial attention mechanisms, limiting fidelity and semantic coherence. To address this, we propose a dual-conditioning framework that combines EEG embeddings with spatial saliency maps to enhance image generation. Our approach leverages the Adaptive Thinking Mapper (ATM) for EEG feature extraction and fine-tunes Stable Diffusion 2.1 via Low-Rank Adaptation (LoRA) to align neural signals with visual semantics, while a ControlNet branch conditions generation on saliency maps for spatial control. Evaluated on THINGS-EEG, our method achieves a significant improvement in the quality of low- and high-level image features over existing approaches. Simultaneously, strongly aligning with human visual attention. The results demonstrate that attentional priors resolve EEG ambiguities, enabling high-fidelity reconstructions with applications in medical diagnostics and neuroadaptive interfaces, advancing neural decoding through efficient adaptation of pre-trained diffusion models.

Paper Structure

This paper contains 4 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Example stimuli and reconstructions showing original images (top), corresponding EEG signals (middle), and our model's outputs (bottom) conditioned on both EEG and saliency patterns.
  • Figure 2: Workflow of our EEG and saliency-conditioned image generation framework. Stage 1: LoRA fine-tuning of Stable Diffusion with EEG embeddings. Stage 2: ControlNet training with saliency maps while keeping the EEG-conditioned model frozen.