Evaluating alignment between humans and neural network representations in image-based learning tasks

Can Demircan; Tankred Saanum; Leonardo Pettini; Marcel Binz; Blazej M Baczkowski; Christian F Doeller; Mona M Garvert; Eric Schulz

Evaluating alignment between humans and neural network representations in image-based learning tasks

Can Demircan, Tankred Saanum, Leonardo Pettini, Marcel Binz, Blazej M Baczkowski, Christian F Doeller, Mona M Garvert, Eric Schulz

TL;DR

The paper investigates how well pretrained neural network representations align with human learning in two naturalistic image-based tasks. By evaluating 86 models across category and reward learning with THINGS images, the authors show that CLIP-style multimodal representations most closely track human choices, with alignment improving with larger data exposure and better class separation, while intrinsic dimensionality effects vary by model type. Using model-based analyses and contrastive comparisons (e.g., CLIP vs. SimCLR) they demonstrate that multimodal training contributes to alignment beyond data size alone, although explicit human-alignment methods yield inconsistent improvements. The work advances cognitive modeling in naturalistic settings and provides a framework to quantify and improve alignment between human cognition and neural representations, with implications for developing more cognitively robust AI systems.

Abstract

Humans represent scenes and objects in rich feature spaces, carrying information that allows us to generalise about category memberships and abstract functions with few examples. What determines whether a neural network model generalises like a human? We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories across two tasks where humans had to learn continuous relationships and categories of natural images. In these tasks, both human participants and neural networks successfully identified the relevant stimulus features within a few trials, demonstrating effective generalisation. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. Intrinsic dimensionality of representations had different effects on alignment for different model types. Lastly, we tested three sets of human-aligned representations and found no consistent improvements in predictive accuracy compared to the baselines. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks. Both our paradigms and modelling approach offer a novel way to quantify alignment between neural networks and humans and extend cognitive science into more naturalistic domains.

Evaluating alignment between humans and neural network representations in image-based learning tasks

TL;DR

Abstract

pretrained neural network models mapped to human learning trajectories across two tasks where humans had to learn continuous relationships and categories of natural images. In these tasks, both human participants and neural networks successfully identified the relevant stimulus features within a few trials, demonstrating effective generalisation. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. Intrinsic dimensionality of representations had different effects on alignment for different model types. Lastly, we tested three sets of human-aligned representations and found no consistent improvements in predictive accuracy compared to the baselines. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks. Both our paradigms and modelling approach offer a novel way to quantify alignment between neural networks and humans and extend cognitive science into more naturalistic domains.

Paper Structure (11 sections, 6 equations, 12 figures)

This paper contains 11 sections, 6 equations, 12 figures.

Introduction
Experiments
Behavioural analyses
Model-based analyses
Discussion
Related work
Limitations
Conclusion
Methods
Testing aligned models on other datasets
Additional results

Figures (12)

Figure 1: Task descriptions. (A) An example trial from the category learning task, where an incorrect decision is made. (B) An example trial from the reward learning task where the best option is chosen and highlighted in orange. (C) Example images from the THINGS database hebart_things_2019. The database has a low dimensional semantically interpretable embedding hebart2020revealing, which is derived from human similarity judgements. The example images are placed in the most three prominent dimensions of this embedding. In both tasks, participants were randomly assigned to one of these three dimensions. The associated category membership and rewards for the two tasks are displayed.
Figure 2: Learning trajectories of human participants and neural networks. Neural networks can perform as well as humans. (A & B) Accuracy of human participants across trials for the category and the reward learning tasks respectively. Shaded lines indicate $95\%$ confidence intervals. (C & D) Example learning curves for the neural network representations in the category and the reward learning tasks respectively. The best-performing models from each model type are shown.
Figure 3: Model fits to human choice data. In both category learning (A) and reward learning tasks (B), several CLIP models predict human choices the best, even better than the generative features of the tasks. How well the models fitted human choice was more heterogeneously distributed for supervised, self-supervised, and language models. Plotted are the cross-validated McFadden's $R^2$ of each representation for the category learning and the reward learning tasks respectively. Higher values indicate better fits to human behaviour. $0$ marks the alignment of a random model.
Figure 4: Several factors contribute to alignment. Models trained on more data and with more trainable parameters predict human choices with higher accuracy. Turning to representations, those that better separate image classes and are more similar to the generative task features exhibit stronger alignment with human choices.
Figure 5: Lower intrinsic dimensionality is linked with higher alignment most strongly for the multimodal models, and to a lesser extent with supervised ones.
...and 7 more figures

Evaluating alignment between humans and neural network representations in image-based learning tasks

TL;DR

Abstract

Evaluating alignment between humans and neural network representations in image-based learning tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (12)