Table of Contents
Fetching ...

Looking through the mind's eye via multimodal encoder-decoder networks

Arman Afrasiyabi, Erica Busch, Rahul Singh, Dhananjay Bhaskar, Laurent Caplette, Nicholas Turk-Browne, Smita Krishnaswamy

TL;DR

This work created a mapping between a subject's fMRI signals elicited by the videos the subjects watched, and aligned a latent representation of these fMRI measurements with a corresponding video-fMRI based on textual labels given to the videos themselves.

Abstract

In this work, we explore the decoding of mental imagery from subjects using their fMRI measurements. In order to achieve this decoding, we first created a mapping between a subject's fMRI signals elicited by the videos the subjects watched. This mapping associates the high dimensional fMRI activation states with visual imagery. Next, we prompted the subjects textually, primarily with emotion labels which had no direct reference to visual objects. Then to decode visual imagery that may have been in a person's mind's eye, we align a latent representation of these fMRI measurements with a corresponding video-fMRI based on textual labels given to the videos themselves. This alignment has the effect of overlapping the video fMRI embedding with the text-prompted fMRI embedding, thus allowing us to use our fMRI-to-video mapping to decode. Additionally, we enhance an existing fMRI dataset, initially consisting of data from five subjects, by including recordings from three more subjects gathered by our team. We demonstrate the efficacy of our model on this augmented dataset both in accurately creating a mapping, as well as in plausibly decoding mental imagery.

Looking through the mind's eye via multimodal encoder-decoder networks

TL;DR

This work created a mapping between a subject's fMRI signals elicited by the videos the subjects watched, and aligned a latent representation of these fMRI measurements with a corresponding video-fMRI based on textual labels given to the videos themselves.

Abstract

In this work, we explore the decoding of mental imagery from subjects using their fMRI measurements. In order to achieve this decoding, we first created a mapping between a subject's fMRI signals elicited by the videos the subjects watched. This mapping associates the high dimensional fMRI activation states with visual imagery. Next, we prompted the subjects textually, primarily with emotion labels which had no direct reference to visual objects. Then to decode visual imagery that may have been in a person's mind's eye, we align a latent representation of these fMRI measurements with a corresponding video-fMRI based on textual labels given to the videos themselves. This alignment has the effect of overlapping the video fMRI embedding with the text-prompted fMRI embedding, thus allowing us to use our fMRI-to-video mapping to decode. Additionally, we enhance an existing fMRI dataset, initially consisting of data from five subjects, by including recordings from three more subjects gathered by our team. We demonstrate the efficacy of our model on this augmented dataset both in accurately creating a mapping, as well as in plausibly decoding mental imagery.
Paper Structure (12 sections, 5 equations, 4 figures)

This paper contains 12 sections, 5 equations, 4 figures.

Figures (4)

  • Figure 1: In (a), our model provides a schematic overview, detailing its initial focus on learning the connections between video clips and imagery through machine learning techniques. Subsequently, it generates artistic visualizations based solely on brain recordings. (b) illustrates the architecture of our FluxBrain, comprising three distinct encoder-decoder models. These models are responsible for reconstructing video (left), brain activity during video stimulation (middle), and recovering text prompts (right). An orange network facilitates one-to-one matching between the embeddings of video and stimulus-brain recordings, while a green network aims at aligning the distributions of text- and video-stimulus prompts. (c) presents the inference pathway leading to the reconstruction of brain activity stimulated by text, utilizing our proposed architectural framework.
  • Figure 2: PHATE moon2019visualizing visualization showcasing the distribution alignment between video prompts and their corresponding emotion expressed as text.
  • Figure 3: Video-prompt fMRI reconstruction based on brain recordings stimulated by video prompts, with each row representing a unique video simulation scenario.
  • Figure 4: Text-prompt fMRI reconstruction from brain recordings triggered by text prompts, where each row displays a distinct emotion simulation scenario across six sequential frames.