Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment
Zitong Lu, Yile Wang
TL;DR
This work addresses the gap between DCNNs and human visual perception by teaching an image-trained CORnet-S to capture human fMRI representations through a multi-layer encoding alignment, producing ReAlnet-fMRI. Using a three-subject Shen fMRI dataset for training and diverse test sets (including Horikawa fMRI and THINGS EEG2), the authors demonstrate enhanced model-brain alignment within and across subjects and modalities, including EEG. Internal analyses reveal that fMRI-guided optimization shifts encoding toward food-related, artificial/hard, and electronics features, suggesting that neural data can enrich visual representations beyond image-only training. The approach offers a scalable path toward more brain-like AI by integrating neural data, with implications for robustness, generalization, and cross-domain applications.
Abstract
Deep convolutional neural networks (DCNNs) have demonstrated excellent performance in object recognition and have been found to share some similarities with brain visual processing. However, the substantial gap between DCNNs and human visual perception still exists. Functional magnetic resonance imaging (fMRI) as a widely used technique in cognitive neuroscience can record neural activation in the human visual cortex during the process of visual perception. Can we teach DCNNs human fMRI signals to achieve a more brain-like model? To answer this question, this study proposed ReAlnet-fMRI, a model based on the SOTA vision model CORnet but optimized using human fMRI data through a multi-layer encoding-based alignment framework. This framework has been shown to effectively enable the model to learn human brain representations. The fMRI-optimized ReAlnet-fMRI exhibited higher similarity to the human brain than both CORnet and the control model in within-and across-subject as well as within- and across-modality model-brain (fMRI and EEG) alignment evaluations. Additionally, we conducted an in-depth analyses to investigate how the internal representations of ReAlnet-fMRI differ from CORnet in encoding various object dimensions. These findings provide the possibility of enhancing the brain-likeness of visual models by integrating human neural data, helping to bridge the gap between computer vision and visual neuroscience.
