Table of Contents
Fetching ...

Dimensions underlying the representational alignment of deep neural networks with humans

Florian P. Mahner, Lukas Muttenthaler, Umut Güçlü, Martin N. Hebart

TL;DR

This work tackles the challenge that global alignment metrics between human and AI representations offer limited explanatory power about why similarities arise. It introduces a variational embedding framework to extract latent, interpretable dimensions from triplet odd‑one‑out judgments, enabling direct cross‑domain comparison between humans and a DNN trained on natural images. Applying this to humans and a VGG‑16–based model reveals a low‑dimensional embedding with distinct semantic (humans) versus visual (DNNs) biases, and shows that while some dimensions align strongly (up to $r \approx 0.80$ for select pairs), the overall representational strategies diverge, with humans relying more on semantic cues and DNNs on visual cues. The results demonstrate that direct, dimension‑level comparisons can uncover nuanced factors driving alignment and misalignment, offering a framework to guide the development of more human‑aligned AI through multimodal training and richer datasets, and providing a tool for testing representational hypotheses across domains.

Abstract

Determining the similarities and differences between humans and artificial intelligence (AI) is an important goal both in computational cognitive neuroscience and machine learning, promising a deeper understanding of human cognition and safer, more reliable AI systems. Much previous work comparing representations in humans and AI has relied on global, scalar measures to quantify their alignment. However, without explicit hypotheses, these measures only inform us about the degree of alignment, not the factors that determine it. To address this challenge, we propose a generic framework to compare human and AI representations, based on identifying latent representational dimensions underlying the same behavior in both domains. Applying this framework to humans and a deep neural network (DNN) model of natural images revealed a low-dimensional DNN embedding of both visual and semantic dimensions. In contrast to humans, DNNs exhibited a clear dominance of visual over semantic properties, indicating divergent strategies for representing images. While in-silico experiments showed seemingly consistent interpretability of DNN dimensions, a direct comparison between human and DNN representations revealed substantial differences in how they process images. By making representations directly comparable, our results reveal important challenges for representational alignment and offer a means for improving their comparability.

Dimensions underlying the representational alignment of deep neural networks with humans

TL;DR

This work tackles the challenge that global alignment metrics between human and AI representations offer limited explanatory power about why similarities arise. It introduces a variational embedding framework to extract latent, interpretable dimensions from triplet odd‑one‑out judgments, enabling direct cross‑domain comparison between humans and a DNN trained on natural images. Applying this to humans and a VGG‑16–based model reveals a low‑dimensional embedding with distinct semantic (humans) versus visual (DNNs) biases, and shows that while some dimensions align strongly (up to for select pairs), the overall representational strategies diverge, with humans relying more on semantic cues and DNNs on visual cues. The results demonstrate that direct, dimension‑level comparisons can uncover nuanced factors driving alignment and misalignment, offering a framework to guide the development of more human‑aligned AI through multimodal training and richer datasets, and providing a tool for testing representational hypotheses across domains.

Abstract

Determining the similarities and differences between humans and artificial intelligence (AI) is an important goal both in computational cognitive neuroscience and machine learning, promising a deeper understanding of human cognition and safer, more reliable AI systems. Much previous work comparing representations in humans and AI has relied on global, scalar measures to quantify their alignment. However, without explicit hypotheses, these measures only inform us about the degree of alignment, not the factors that determine it. To address this challenge, we propose a generic framework to compare human and AI representations, based on identifying latent representational dimensions underlying the same behavior in both domains. Applying this framework to humans and a deep neural network (DNN) model of natural images revealed a low-dimensional DNN embedding of both visual and semantic dimensions. In contrast to humans, DNNs exhibited a clear dominance of visual over semantic properties, indicating divergent strategies for representing images. While in-silico experiments showed seemingly consistent interpretability of DNN dimensions, a direct comparison between human and DNN representations revealed substantial differences in how they process images. By making representations directly comparable, our results reveal important challenges for representational alignment and offer a means for improving their comparability.
Paper Structure (23 sections, 5 equations, 12 figures)

This paper contains 23 sections, 5 equations, 12 figures.

Figures (12)

  • Figure 1: Overview: A computational framework that captures core DNN object representations in analogy to humans by simulating behavioral decisions in an odd-one-out task.a, The triplet odd-one-out task, where a human participant or a DNN is presented a set of three images and is asked to select the image that is most different from the others. b, Sampling approach of odd-one-out decisions from DNN representations. First, a dot-product similarity space is constructed from DNN features. Next, for a given triplet of objects, the most similar pair in this similarity space is identified, making the remaining object the odd-one-out. For humans, this sampling approach is based on observed behavior, which is used as a measure of their internal cognitive representations. c, Illustration of the computational modeling approach to learn a lower-dimensional object representation for human participants and the DNN, optimized to predict behavioral choices made in the triplet task. d, Schematic depiction of the interpretability pipeline that allows for the prediction of object embeddings from pretrained DNN features.
  • Figure 1: Dimension ratings and representational similarity across models. a, VGG-16 does not perform poorly when compared to other models, including Resnet50, DenseNet, CLIP, and BarlowTwins-Resnet50. b, The visual bias identified in VGG-16 is also evident across these other architectures, demonstrating consistent differences between human and DNN dimensions.
  • Figure 2: Representational embeddings inferred from human and DNN behavior.a, Visualization of example dimensions from human- and DNN-derived representational embeddings, with a selection of dimensions that had been rated as semantic, mixed visual-semantic, and visual, alongside their dimension labels obtained from human judgments. Note that the displayed images reflect only images with a public domain license and not the full image set stoinski2023thingsplusb, Rating procedure for each dimension, which was based on visualizing the top $k$ images according to their numeric weights. Human participants labeled each of the human and DNN dimensions as predominantly semantic, visual, mixed visual-semantic, or unclear (unclear ratings not shown: 7.35% of all dimensions for humans, 8.57% for VGG-16). c, Relative importance of dimensions labeled as visual and semantic, where VGG-16 exhibited a dominance of visual and mixed dimensions relative to humans that showed a clear dominance of semantic dimensions.
  • Figure 2: Dimension ratings and representational similarity across different VGG-16 layers. We compared dimensions learned from features extracted from early, middle, late and penultimate layers of VGG-16. a, The embedding learned from the penultimate layer representations has the largest representational alignment to human behavior. b, The visual bias was strongest in early layers of VGG-16 and semantic information is added in later layers. The fraction of semantic dimensions compared to visual dimensions is largest in the penultimate embedding.
  • Figure 3: Relevance of image properties for embedding dimension.a, General methodology of the approach. We used Grad-CAM selvaraju2017grad to visualize the importance of distinct image parts based on the gradients of the penultimate DNN features that we initially used to sample triplet choices. The gradients were obtained in our fully differentiable interpretability model with respect to a dimension ${\bm{w}}$ in our embedding. b, We visualize the heatmaps for three different images and dimensions. Each column shows the relevance of parts of an image for that dimension. For this figure, we filtered the embedding by images available in the public domain stoinski2023thingsplus, except for two images sourced from Flickr under a CC BY 2.0 license: the flashlight by cborysiuk and the wineglass by wszkutnik.
  • ...and 7 more figures