Table of Contents
Fetching ...

Brain decoding: toward real-time reconstruction of visual perception

Yohann Benchetrit, Hubert Banville, Jean-Rémi King

TL;DR

An MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator provide an important step towards the decoding of the visual processes continuously unfolding within the human brain.

Abstract

In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution ($\approx$0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution ($\approx$5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that high-level visual features can be decoded from MEG signals, although the same approach applied to 7T fMRI also recovers better low-level features. Overall, these results, while preliminary, provide an important step towards the decoding -- in real-time -- of the visual processes continuously unfolding within the human brain.

Brain decoding: toward real-time reconstruction of visual perception

TL;DR

An MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator provide an important step towards the decoding of the visual processes continuously unfolding within the human brain.

Abstract

In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution (0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution (5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that high-level visual features can be decoded from MEG signals, although the same approach applied to 7T fMRI also recovers better low-level features. Overall, these results, while preliminary, provide an important step towards the decoding -- in real-time -- of the visual processes continuously unfolding within the human brain.
Paper Structure (47 sections, 3 equations, 13 figures, 5 tables)

This paper contains 47 sections, 3 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: (A) Approach. Locks indicate pretrained models. (B) Processing schemes. Unlike image generation, retrieval happens in latent space, but requires the true image in the retrieval set.
  • Figure 2: Image retrieval performance obtained from a trained deep ConvNet. Linear decoder baseline performance (see Table \ref{['tab:retrieval_linear']}) is shown with a black transparent bar for each latent. The original "small" test set hebart2023things comprises 200 distinct images, each belonging to a different category. In contrast, our proposed "large" test set comprises 12 images from each of those 200 categories, yielding a total of 2,400 images. Chance-level is 2.5% top-5 accuracy for the small test set and 0.21% for the large test set. The best latent representations yield accuracies around 70% and 13% for the small and large test sets, respectively.
  • Figure 3: Retrieval performance of models trained on 100-ms sliding windows with a stride of 25 ms for different image representations. The shaded gray area indicates the 500-ms interval during which images were presented to the participants and the horizontal dashed line indicates chance-level performance. Accuracy peaks a few hundreds of milliseconds after both the image onset and offset for all embeddings.
  • Figure 4: Handpicked examples of successful generations. (A) Generations obtained on growing windows starting at image onset (0 ms) and ending at the specified time. (B) Full-window generations (-500 to 1,000 ms).
  • Figure S1: Hyperparameter search results for the MEG-to-image retrieval task, presenting the impact of (A) optimizer learning rate and batch size, (B) number of convolutional blocks and use of spatial attention and/or subject-specific layers in the brain module, (C) MEG window parameters, (D) type of temporal aggregation layer and number of blocks in the CLIP projection head of the brain module, and (E) CLIP loss configuration (normalization axes, use of learned temperature parameter and use of symmetric terms). Chance-level performance top-5 accuracy is 0.05%.
  • ...and 8 more figures