Table of Contents
Fetching ...

BrainDecoder: Style-Based Visual Decoding of EEG Signals

Minsuk Choi, Hiroshi Ishikawa

TL;DR

BrainDecoder tackles the challenge of reconstructing not only the semantic content but also the style of visual stimuli from EEG signals. By aligning EEG representations with both CLIP image and CLIP text spaces and fusing these cues through a latent diffusion generator with decoupled cross-attention, the method captures color, texture, and layout details previously missing in EEG-based reconstructions. The approach achieves state-of-the-art performance on Brain2Image, with high 50-way top-1 accuracy and strong quality metrics, and ablations confirm the complementary value of dual CLIP alignments and simple caption labeling. This work advances EEG-to-image decoding toward more faithful, richly detailed reconstructions, with potential implications for richer brain-computer interfaces and cognitive neuroscience.

Abstract

Decoding neural representations of visual stimuli from electroencephalography (EEG) offers valuable insights into brain activity and cognition. Recent advancements in deep learning have significantly enhanced the field of visual decoding of EEG, primarily focusing on reconstructing the semantic content of visual stimuli. In this paper, we present a novel visual decoding pipeline that, in addition to recovering the content, emphasizes the reconstruction of the style, such as color and texture, of images viewed by the subject. Unlike previous methods, this ``style-based'' approach learns in the CLIP spaces of image and text separately, facilitating a more nuanced extraction of information from EEG signals. We also use captions for text alignment simpler than previously employed, which we find work better. Both quantitative and qualitative evaluations show that our method better preserves the style of visual stimuli and extracts more fine-grained semantic information from neural signals. Notably, it achieves significant improvements in quantitative results and sets a new state-of-the-art on the popular Brain2Image dataset.

BrainDecoder: Style-Based Visual Decoding of EEG Signals

TL;DR

BrainDecoder tackles the challenge of reconstructing not only the semantic content but also the style of visual stimuli from EEG signals. By aligning EEG representations with both CLIP image and CLIP text spaces and fusing these cues through a latent diffusion generator with decoupled cross-attention, the method captures color, texture, and layout details previously missing in EEG-based reconstructions. The approach achieves state-of-the-art performance on Brain2Image, with high 50-way top-1 accuracy and strong quality metrics, and ablations confirm the complementary value of dual CLIP alignments and simple caption labeling. This work advances EEG-to-image decoding toward more faithful, richly detailed reconstructions, with potential implications for richer brain-computer interfaces and cognitive neuroscience.

Abstract

Decoding neural representations of visual stimuli from electroencephalography (EEG) offers valuable insights into brain activity and cognition. Recent advancements in deep learning have significantly enhanced the field of visual decoding of EEG, primarily focusing on reconstructing the semantic content of visual stimuli. In this paper, we present a novel visual decoding pipeline that, in addition to recovering the content, emphasizes the reconstruction of the style, such as color and texture, of images viewed by the subject. Unlike previous methods, this ``style-based'' approach learns in the CLIP spaces of image and text separately, facilitating a more nuanced extraction of information from EEG signals. We also use captions for text alignment simpler than previously employed, which we find work better. Both quantitative and qualitative evaluations show that our method better preserves the style of visual stimuli and extracts more fine-grained semantic information from neural signals. Notably, it achieves significant improvements in quantitative results and sets a new state-of-the-art on the popular Brain2Image dataset.
Paper Structure (12 sections, 3 equations, 4 figures, 2 tables)

This paper contains 12 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The overall architecture of our proposed BrainDecoder framework. The modules in blue are frozen during training and only the modules in red are updated. The bold arrows are used during inference and the dotted lines are used during training.
  • Figure 2: Sample outputs. The images on the left show the ground truth visual stimuli shown during dataset collection. The following two images are sample outputs from our framework. Notably, the sample results show a high correspondence in semantics and style to the visual stimuli.
  • Figure 3: Comparison of output images with the ground truth and outputs from other methods.
  • Figure 4: Example layout-oriented captions generated with LLaVA.