Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

Zitong Lu; Yile Wang; Julie D. Golomb

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

Zitong Lu, Yile Wang, Julie D. Golomb

TL;DR

The study presents ReAlnet, a multi-layer EEG-aligned vision model that directly integrates human brain activity into the training objective, achieving closer alignment to human neural representations across EEG, fMRI, and behavior than conventional models. By attaching a layer-wise EEG-encoding and EEG-generation module to CORnet-S and training subject-specific instances, the approach yields subject-tuned, brain-like internal representations that generalize across unseen image categories and modalities. Cross-subject analyses show both EEG- and fMRI-related brain-likeness improvements, while behavior alignment is strongest when models are tuned to individual EEG data. The across-subject variant demonstrates partial cross-modal gains, highlighting potential for broader brain-like AI but also the need for targeted tuning to capture human behavioral patterns; overall, the framework advances brain-inspired AI by leveraging non-invasive human neural signals to shape internal representations with practical implications for robust, human-like vision systems.

Abstract

Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain processing; however, these often rely on invasive neural recordings from non-human subjects, leaving a critical gap in understanding human visual perception. Addressing this gap, we present, 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG, demonstrating a significantly higher similarity to human brain representations. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers and enabling the model to efficiently learn and mimic the human brain's visual representational patterns across object categories and different modalities. Our findings suggest that ReAlnets better align artificial neural networks with human brain representations, making it more similar to human brain processing than traditional computer vision models, which takes an important step toward bridging the gap between artificial and human vision and achieving more brain-like artificial intelligence systems.

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

TL;DR

Abstract

Paper Structure (18 sections, 5 equations, 22 figures, 4 tables)

This paper contains 18 sections, 5 equations, 22 figures, 4 tables.

Introduction
Results
Aligning CORnet with human EEG representations
Improved similarity to human EEG
Improved similarity in ReAlnets to human fMRI
Improved similarity in ReAlnets to behavior
Refined object feature representations in ReAlnets
Control experiments
Human EEG-aligned ResNet also being more brain-like
ReAlnets trained across subjects exhibit higher similarity to EEG and fMRI but not behavior
Discussion
Methods
Human EEG data for representational alignment
Human fMRI data for cross-modality testing
Image-to-brain encoding-based alignment pipeline
...and 3 more sections

Figures (22)

Figure 1: ReAlnets aligned with human EEG signals as more human brain-like vision models. (A) An overview of ReAlnet alignment framework. Adding an additional multi-layer encoding module to an ImageNet pre-trained CORnet-S, the outputs contain the category classification results and the generated EEG signals. Using THINGS EEG2 training dataset, we aim to minimize both classification loss and generation loss, enabling CORnet to not only stabilize the classification performance but also effectively learn human brain features and transform into ReAlnets. (B) Representational similarity between internal representations in models and human temporal EEG signals from THINGS EEG2 test dataset. Models include ReAlnets and their primary comparison model CORnet-S, along with ResNet101 and CLIP (with a ResNet101 backbone) as additional baselines. Because the additional baselines have different number of layers, for all models we took the first layer as the Early Layer, and the layer before the classification layer (or last visual layer in CLIP) as the Late Layer for this analysis. The line labeled "ReAlnet" reflects the mean similarity across 10 individual ReAlnets, each trained on a different subject's EEG data (N=10). For comparison models, each line reflects the mean similarity between the same 10 human EEG datasets and the single model instance. ReAlnets
Figure 2: ReAlnets show higher similarity to human EEG and hierarchical individual variability. (A) Representational similarity time courses between human EEG and models (ReAlnets, Scrambled models, and CORnet) for different layers respectively. Dark blue square dots at the bottom indicate the timepoints where ReAlnet vs. CORnet were significantly different ($p<.05$). Grey square dots at the bottom indicate the timepoints where ReAlnet vs. CORnet were significantly different ($p<.05$). Lines and shading reflect mean±SEM. (B) Similarity improvement and similarity improvement ratio of ReAlnets compared to CORnet at the similarity peak timepoint. Each circle dot indicates an individual ReAlnet. Error bar reflects ±SEM. (C) Time courses of the maximum representational similarity between human EEG and different models (ReAlnets, Scrambled models, and CORnet), computed by taking the highest similarity across all model layers at each timepoint. Dark blue square dots at the bottom indicate timepoints where ReAlnets significantly outperformed CORnet ($p < .05$). Grey square dots indicate significant differences between ReAlnets and Scrambled models ($p < .05$). Lines and shading reflect mean±SEM. (D) Top: ReAlnet individual variability matrices of four visual layers. Bottom left: ReAlnet individual variability along layers. Bottom right: Human fMRI individual variability along the visual cortex. Each circle dot indicates a pair of two personalized ReAlnets or two human subjects. Error bar reflects ±SEM. (E) Cross-subject similarity matrix showing each individualized ReAlnet (rows) generalizes to EEG representations from all 10 subjects (columns). Each cell reflects the average representational similarity between human EEG and ReAlnets and CORnet across four model layers and the 50-200 ms time window. (F) Cross-subject generalization beyond baseline CORnet. Each cell reflects the ReAlnet–CORnet difference in EEG similarity, with positive values indicating that even mismatched ReAlnets outperform CORnet on other subjects’ EEG data. (G) Left: Column-wise normalized similarity matrix, where each column is scaled such that the highest similarity value is 1. Right: A statistical comparison between matched and mismatched pairs. Black asterisks indicate significantly higher similarity of matched pairs than mismatched pairs ($p<.05$). Error bar reflects ±SEM.
Figure 3: ReAlnets show higher similarity to human fMRI representations. Representational similarity between models and human fMRI of five different brain regions when three subjects in Shen fMRI test dataset viewed (A) natural images, (B) artificial shape images, and (C) alphabetical letter images. Black asterisks indicate significantly higher similarity of ReAlnets than that of Scrambled model or CORnet ($p<.05$), and grey asterisks indicate significantly lower similarity of ReAlnets than that of Scrambled model or CORnet ($p<.05$). Each circle dot indicates an individual ReAlnet or Scrambled model. Error bar reflects ±SEM.
Figure 4: Enhanced behavioral similarity and feature representations in ReAlnets. (A) ReAlnets show higher similarity to human behavior based on the Brain-Score platform. Each orange circle dot indicates an individual ReAlnet. Each grey circle dot indicates an individual scrambled model. Asterisks indicate significantly higher similarity of ReAlnets than that of CORnet or scrambled models ($p<.05$). (B) Top-3 enhanced feature representations in ReAlnets compared to CORnet and scrambled models. Each orange circle dot indicates an individual ReAlnet. Each grey circle dot indicates an individual scrambled model. Error bar reflects ±SEM.
Figure 5: Results of control experiments. (A) Improvement in human EEG similarity of ReAlnets and control models compared to CORnet. (B) Improvement in human fMRI similarity of ReAlnets and control models compared to CORnet. Each circle dot indicates an individual model. Asterisks indicate the significance ($p<.05$). Error bar reflects ±SEM.
...and 17 more figures

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

TL;DR

Abstract

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (22)