Mapping fMRI Signal and Image Stimuli in an Artificial Neural Network Latent Space: Bringing Artificial and Natural Minds Together
Cesare Maria Dalbagno, Manuel de Castro Ribeiro Jardim, Mihnea Angheluţă
TL;DR
This work probes whether latent representations of visual stimuli in an autoencoder and fMRI-derived representations in a neural encoder share information. It constructs a CNN-based autoencoder for fMRI data and uses a Vision Transformer (ViT) to generate image embeddings, comparing them via Representational Similarity Analysis after dimensionality reduction. The study finds no meaningful cross-domain latent-space alignment (RSA yields $r = 1.672\times 10^{-2}$ with $p < 0.001$), highlighting challenges such as domain biases and dataset mismatches. These findings underline the difficulties in aligning artificial and neural representations and point to future work on fine-tuning, alternative embedding strategies, and geometric/semantic alignment approaches to enhance interpretability and cross-domain retrieval.
Abstract
The goal of this study is to investigate whether latent space representations of visual stimuli and fMRI data share common information. Decoding and reconstructing stimuli from fMRI data remains a challenge in AI and neuroscience, with significant implications for understanding neural representations and improving the interpretability of Artificial Neural Networks (ANNs). In this preliminary study, we investigate the feasibility of such reconstruction by examining the similarity between the latent spaces of one autoencoder (AE) and one vision transformer (ViT) trained on fMRI and image data, respectively. Using representational similarity analysis (RSA), we found that the latent spaces of the two domains appear different. However, these initial findings are inconclusive, and further research is needed to explore this relationship more thoroughly.
