Table of Contents
Fetching ...

Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion

Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi

Abstract

This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks.

Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion

Abstract

This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks.

Paper Structure

This paper contains 10 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Schematic representation of our proposed model, illustrating the alignment of various neural datasets from different modalities into a unified representation space utilizing a frozen CLIP Image encoder.
  • Figure 2: The top panel illustrates the 'Decoding' experiment, where neural data is processed to 'decode' and retrieve visually related images from a dataset. The middle panel depicts the 'Encoding' experiment, where an image is used to predict and retrieve neural data that could be associated with the visual perception of that image. The bottom panel shows the 'Modality Conversion' experiment, demonstrating the translation of neural data from one modality, such as EEG, into another, such as fMRI, aiming to find semantically similar brain activity across modalities.
  • Figure 3: Comparative Results of Multimodal Neural Decoding. The figure shows the original visual stimuli and the images retrieved using decoding modules for fMRI, EEG, and MEG data. Each block corresponds to a different modality, illustrating the model's ability to identify and retrieve images that closely resemble or are semantically related to the original stimulus.
  • Figure 4: Encoding Experiment Results Displaying Image-to-Brain Activity Correlation. Rows illustrate the results for EEG, MEG, and fMRI modalities. The leftmost images are the encoded stimuli, and the subsequent images represent images related to the brain activities retrieved by the model.