Table of Contents
Fetching ...

Perceptogram: Reconstructing Visual Percepts and Presumptive Electrode Preference from EEG

Teng Fei, Srinivas Ravishankar, Zhining Chen, Abhinav Uppal, Ian Jackson, Virginia R. de Sa

TL;DR

Perceptogram tackles the challenge of reconstructing visual percepts from EEG with interpretability. It proposes a linear brain-to-CLIP latent mapping, followed by a frozen diffusion model, achieving state-of-the-art reconstruction without deep networks. The authors introduce latent-filtered EEG patterns and perturbation tests (electrode mirroring, time-swapping) to visualize presumptive electrode preferences and spatiotemporal dynamics, and validate cross-modality with NSD fMRI RSA. These findings suggest that EEG and CLIP representations share a common semantic structure enabling linear decoding, with potential benefits for neurotech and human-centered computer vision.

Abstract

Visual neural decoding from EEG has improved significantly due to diffusion models that can reconstruct high-quality images from decoded latents. While recent works have focused on relatively complex architectures to achieve good reconstruction performance from EEG, less attention has been paid to the source of this information. We present a unified framework that not only enables image reconstruction from EEG using a simple linear decoder, but also isolates interpretable EEG feature maps that support visual perception. Unlike prior approaches that rely on deep, opaque models, our method leverages the inherent structure of CLIP embeddings to keep the mapping linear. We show that training a simple linear decoder from EEG to CLIP latent space, followed by a frozen pre-trained diffusion model, is sufficient to decode images with state-of-the-art reconstruction performance. Beyond reconstruction, Perceptogram enables the visualization of presumptive electrode preference and EEG patterns, revealing interpretable EEG feature maps that correspond to distinct visual attributes, such as semantic class, texture, and hue. We thus use our framework, Perceptogram, to probe EEG signals at various levels of the visual information hierarchy.

Perceptogram: Reconstructing Visual Percepts and Presumptive Electrode Preference from EEG

TL;DR

Perceptogram tackles the challenge of reconstructing visual percepts from EEG with interpretability. It proposes a linear brain-to-CLIP latent mapping, followed by a frozen diffusion model, achieving state-of-the-art reconstruction without deep networks. The authors introduce latent-filtered EEG patterns and perturbation tests (electrode mirroring, time-swapping) to visualize presumptive electrode preferences and spatiotemporal dynamics, and validate cross-modality with NSD fMRI RSA. These findings suggest that EEG and CLIP representations share a common semantic structure enabling linear decoding, with potential benefits for neurotech and human-centered computer vision.

Abstract

Visual neural decoding from EEG has improved significantly due to diffusion models that can reconstruct high-quality images from decoded latents. While recent works have focused on relatively complex architectures to achieve good reconstruction performance from EEG, less attention has been paid to the source of this information. We present a unified framework that not only enables image reconstruction from EEG using a simple linear decoder, but also isolates interpretable EEG feature maps that support visual perception. Unlike prior approaches that rely on deep, opaque models, our method leverages the inherent structure of CLIP embeddings to keep the mapping linear. We show that training a simple linear decoder from EEG to CLIP latent space, followed by a frozen pre-trained diffusion model, is sufficient to decode images with state-of-the-art reconstruction performance. Beyond reconstruction, Perceptogram enables the visualization of presumptive electrode preference and EEG patterns, revealing interpretable EEG feature maps that correspond to distinct visual attributes, such as semantic class, texture, and hue. We thus use our framework, Perceptogram, to probe EEG signals at various levels of the visual information hierarchy.
Paper Structure (53 sections, 1 theorem, 9 equations, 41 figures, 3 tables)

This paper contains 53 sections, 1 theorem, 9 equations, 41 figures, 3 tables.

Key Result

Lemma 1

With the definitions above,

Figures (41)

  • Figure 1: Pipeline overview: There are three primary components: A linear decoder (orange) from brain space to latent space, a linear encoder (blue) mapping this decoded latent back into brain space, and a reconstructer (purple) that generates an image from the decoder output. The encoder output is a latent-filtered spatio-temporal brain pattern for that image.
  • Figure 2: Flowchart illustrating image reconstruction using CLIP as latent space, and unCLIP as reconstructer. During the Test stage, the test EEG is fed though the 2 matrices to get the predicted VAE and CLIP-Vision latents. unCLIP then turns the predicted VAE and CLIP-Vision latents into actual images.
  • Figure 3: Flowchart illustrating how to produce the EEG patterns linked to the CLIP embedding. It is similar to the regular unCLIP pipeline (Fig. \ref{['fig:flowchart_unCLIP']}). The main difference here is that we train an encoding model predicting EEG from CLIP.
  • Figure 4: Reconstruction examples from Subject 1 using the CLIP latent space and Versatile Diffusion reconstructer, categorized into best, middle and worst. Best examples were selected by visual inspection, and middle and worst examples were selected by a CLIP score ranking of 94-100 and 194-200 respectively. The rows labeled GT and recon refer to ground truth and reconstructed images respectively. (For full reconstructions, see Fig. \ref{['fig:recon_plot_ordered_by_performance']})
  • Figure 5: Ground truth stimulus images shown at the top; and reconstructions using different latent spaces, in the order: CLIP, PCA, ICA, and VDVAE.
  • ...and 36 more figures

Theorems & Definitions (2)

  • Lemma 1: Column space of the composite filter
  • proof