Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

Zitong Lu; Yile Wang

Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

Zitong Lu, Yile Wang

TL;DR

This work addresses the gap between DCNNs and human visual perception by teaching an image-trained CORnet-S to capture human fMRI representations through a multi-layer encoding alignment, producing ReAlnet-fMRI. Using a three-subject Shen fMRI dataset for training and diverse test sets (including Horikawa fMRI and THINGS EEG2), the authors demonstrate enhanced model-brain alignment within and across subjects and modalities, including EEG. Internal analyses reveal that fMRI-guided optimization shifts encoding toward food-related, artificial/hard, and electronics features, suggesting that neural data can enrich visual representations beyond image-only training. The approach offers a scalable path toward more brain-like AI by integrating neural data, with implications for robustness, generalization, and cross-domain applications.

Abstract

Deep convolutional neural networks (DCNNs) have demonstrated excellent performance in object recognition and have been found to share some similarities with brain visual processing. However, the substantial gap between DCNNs and human visual perception still exists. Functional magnetic resonance imaging (fMRI) as a widely used technique in cognitive neuroscience can record neural activation in the human visual cortex during the process of visual perception. Can we teach DCNNs human fMRI signals to achieve a more brain-like model? To answer this question, this study proposed ReAlnet-fMRI, a model based on the SOTA vision model CORnet but optimized using human fMRI data through a multi-layer encoding-based alignment framework. This framework has been shown to effectively enable the model to learn human brain representations. The fMRI-optimized ReAlnet-fMRI exhibited higher similarity to the human brain than both CORnet and the control model in within-and across-subject as well as within- and across-modality model-brain (fMRI and EEG) alignment evaluations. Additionally, we conducted an in-depth analyses to investigate how the internal representations of ReAlnet-fMRI differ from CORnet in encoding various object dimensions. These findings provide the possibility of enhancing the brain-likeness of visual models by integrating human neural data, helping to bridge the gap between computer vision and visual neuroscience.

Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 18 figures)

This paper contains 12 sections, 3 equations, 18 figures.

Introduction
Methods
Human Neuroimaging Datasets
Model Architecture and Training
Model-Brain Similarity Measurement
Model Internal Representational Analysis
Results
Within-Modality & Within-Subject Model-fMRI Similarity
Within-Modality & Across-subject Model-fMRI Similarity
Across-Modality & Across-Subject Model-EEG Similarity
Internal Representational Analysis
Discussion

Figures (18)

Figure 1: Human fMRI-optimized ReAlnet-fMRI as a more human brain-like vision model. (A) An overview of ReAlnet-fMRI alignment framework. Adding an additional multi-layer encoder to an ImageNet pre-trained CORnet-S, the outputs contain the category classification results and the generated fMRI signals with two losses, a classification loss and a generation loss. (B) Within-subject representational similarity between models (CORnet, Control, and ReAlnet-fMRIs) and human fMRI on natural images. (C) Similarity Improvement ratio of within-subject model-fMRI similarity on natural images of ReAlnet-fMRIs compared to other two models. Each circle dot indicates an individual ReAlnet-fMRI.
Figure 2: Within-subject model-fMRI similarity and similarity improvement ratio on (A) artificial shape images and (B) alphabetical letter images. Each circle dot indicates an individual ReAlnet-fMRI.
Figure 3: Across-subject model-fMRI similarity and similarity improvement ratio. Each circle dot indicates a subject from Horikawa fMRI dataset.
Figure 4: Across-subject temporal model-EEG similarity. Blue and green square dots with black outlines at the bottom indicate the timepoints where ReAlnet-fMRI vs. CORnet and ReAlnet-fMRI vs. Control were significantly different ($p<.05$). Shaded area reflects ±SEM.
Figure 5: Internal representations in ReAlnet-fMRIs and CORnet. (A) Partial r-square of each object dimension in ReAlnet-fMRIs and CORnet. (B) The difference of partial r-square between ReAlnet-fMRIs and CORnet. Each circle dot indicates an individual ReAlnet-fMRI.
...and 13 more figures

Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

TL;DR

Abstract

Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (18)