Human alignment of neural network representations

Lukas Muttenthaler; Jonas Dippel; Lorenz Linhardt; Robert A. Vandermeulen; Simon Kornblith

Human alignment of neural network representations

Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A. Vandermeulen, Simon Kornblith

TL;DR

The paper interrogates how neural-network representations align with human semantic concept spaces and finds that scaling and architecture have little impact, while training data and objectives are the primary drivers of alignment. It demonstrates that a linear transformation learned from human triplet judgments on one dataset can markedly improve cross-dataset alignment, and that image/text models and very large ViTs yield the strongest concept-level alignment, though some concepts remain poorly captured. The work introduces and leverages linear probing and RSA alongside the VICE human-concept space to reveal concept-specific gaps and the potential to recover human-like representations with supervised signals beyond scaling. Overall, achieving human-like conceptual representations likely requires richer supervision and diverse data beyond mere dataset expansion, with implications for transfer, retrieval, and alignment-driven AI applications.

Abstract

Today's computer vision models achieve human or near-human level performance across a wide variety of vision tasks. However, their architectures, data, and learning algorithms differ in numerous ways from those that give rise to human vision. In this paper, we investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses. We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses, whereas the training dataset and objective function both have a much larger impact. These findings are consistent across three datasets of human similarity judgments collected using two different tasks. Linear transformations of neural network representations learned from behavioral responses from one dataset substantially improve alignment with human similarity judgments on the other two datasets. In addition, we find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not. Overall, although models trained on larger, more diverse datasets achieve better alignment with humans than models trained on ImageNet alone, our results indicate that scaling alone is unlikely to be sufficient to train neural networks with conceptual representations that match those used by humans.

Human alignment of neural network representations

TL;DR

Abstract

Paper Structure (28 sections, 9 equations, 30 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 9 equations, 30 figures, 1 table, 1 algorithm.

Introduction
Related Work
Methods
Data
Metrics
Models
Experiments
Odd-one-out vs. ImageNet accuracy
Consistency of results across different datasets
How much alignment can a linear probe recover?
How well do pretrained neural nets represent human concepts?
Human alignment is concept-specific
Can human concepts be recovered via linear regression?
Discussion
Experimental details
...and 13 more sections

Figures (30)

Figure 1: An example triplet from things-responses, where neural nets choose a different odd-one-out than a human. The images in this triplet are copyright-free images from things+ Stoinski2022.
Figure 2: Zero-shot odd-one-out accuracy on things only weakly correlates with ImageNet accuracy and varies with training objective but not with model architecture. Top left: Zero-shot accuracy as a function of ImageNet accuracy for all models. Diagonal line indicates least-squares fit. Top center: Models with the same architecture (ResNet-50) trained with a different objective function or different data augmentation. Since MixUp alters both inputs and targets, it is listed under both objectives and augmentations. Top right: Models trained with the same objective (softmax cross-entropy) but with different architectures. Bottom left: Performance of different SSL models. Bottom center: Zero-shot accuracy is negatively correlated with ImageNet accuracy for image/text models. Bottom right: A subset of ImageNet models with their number of parameters, colored by model family. Note that, in this subplot, models that belong to different families come from different sources and were trained with different objectives, hyperparameters, etc.; thus, models are only directly comparable within a family. In all plots, horizontal lines reflect chance-level or ceiling accuracy. See also Table \ref{['tab:models']}.
Figure 3: Spearman correlation between human and neural network representational similarity matrices is not correlated with ImageNet accuracy for ImageNet models and is negatively correlated for image/text models. Alignment varies with training objective but not with model architecture or number of parameters for both similarity judgment datasets cichy2019king2019. See caption of Figure \ref{['fig:zshot-many-variables']} for further description of panels. Diagonal lines indicate least-squares fits.
Figure 4: Left panel: Zero-shot and probing odd-one-out accuracies for the embedding layer of all neural nets. Right panels: Spearman rank correlation coefficients with and without applying the transformation matrix obtained from linear probing to a model's raw representation space. Dashed lines indicate $y=x$.
Figure 5: Zero-shot and linear probing odd-one-out accuracies differ across VICE concepts. Results are shown for the embedding layer of all models for three of the 45 VICE dimensions. See Appendix \ref{['app:concept_alignment']} for additional dimensions. Color-coding is determined by training data/objective. Violet: Image/Text. Green: Self-supervised. Orange: Supervised (ImageNet-1K). Cyan: Supervised (ImageNet-21K). Black: Supervised (JFT-3B).
...and 25 more figures

Human alignment of neural network representations

TL;DR

Abstract

Human alignment of neural network representations

Authors

TL;DR

Abstract

Table of Contents

Figures (30)