Table of Contents
Fetching ...

Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

Binxia Xu, Xiaoliang Luo, Luke Dickens, Robert M. Mok

TL;DR

This work proposes a human-centred framework that redefines the degree of OOD as a spectrum of human perceptual difficulty and applies this framework to object recognition and reveals unique, regime-dependent model-human alignment rankings and profiles across deep learning architectures.

Abstract

Determining whether AI systems process information similarly to humans is central to cognitive science and trustworthy AI. While modern AI models match human accuracy on standard tasks, such parity does not guarantee that their underlying decision-making strategies are aligned with human information processing. Assessing performance using i) error alignment metrics to compare how humans and models fail, and ii) using distorted, or otherwise more challenging, stimuli, provides a viable pathway toward a finer characterization of model-human alignment. However, existing out-of-distribution (OOD) analyses for challenging stimuli are limited due to methodological choices: they define OOD shift relative to model training data or use arbitrary distortion-specific parameters with little correspondence to human perception, hindering principled comparisons. We propose a human-centred framework that redefines the degree of OOD as a spectrum of human perceptual difficulty. By quantifying how much a collection of stimuli deviates from an undistorted reference set based on human accuracy, we construct an OOD spectrum and identify four distinct regimes of perceptual challenge. This approach enables principled model-human comparisons at calibrated difficulty levels. We apply this framework to object recognition and reveal unique, regime-dependent model-human alignment rankings and profiles across deep learning architectures. Vision-language models are the most consistently human aligned across near- and far-OOD conditions, but CNNs are more aligned than ViTs for near-OOD and ViTs are more aligned than CNNs for far-OOD conditions. Our work demonstrates the critical importance of accounting for cross-condition differences such as perceptual difficulty for a principled assessment of model-human alignment.

Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

TL;DR

This work proposes a human-centred framework that redefines the degree of OOD as a spectrum of human perceptual difficulty and applies this framework to object recognition and reveals unique, regime-dependent model-human alignment rankings and profiles across deep learning architectures.

Abstract

Determining whether AI systems process information similarly to humans is central to cognitive science and trustworthy AI. While modern AI models match human accuracy on standard tasks, such parity does not guarantee that their underlying decision-making strategies are aligned with human information processing. Assessing performance using i) error alignment metrics to compare how humans and models fail, and ii) using distorted, or otherwise more challenging, stimuli, provides a viable pathway toward a finer characterization of model-human alignment. However, existing out-of-distribution (OOD) analyses for challenging stimuli are limited due to methodological choices: they define OOD shift relative to model training data or use arbitrary distortion-specific parameters with little correspondence to human perception, hindering principled comparisons. We propose a human-centred framework that redefines the degree of OOD as a spectrum of human perceptual difficulty. By quantifying how much a collection of stimuli deviates from an undistorted reference set based on human accuracy, we construct an OOD spectrum and identify four distinct regimes of perceptual challenge. This approach enables principled model-human comparisons at calibrated difficulty levels. We apply this framework to object recognition and reveal unique, regime-dependent model-human alignment rankings and profiles across deep learning architectures. Vision-language models are the most consistently human aligned across near- and far-OOD conditions, but CNNs are more aligned than ViTs for near-OOD and ViTs are more aligned than CNNs for far-OOD conditions. Our work demonstrates the critical importance of accounting for cross-condition differences such as perceptual difficulty for a principled assessment of model-human alignment.
Paper Structure (25 sections, 19 equations, 12 figures, 5 tables)

This paper contains 25 sections, 19 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Example images from two distortion types (low-pass and high-pass filtering) from the modelvshuman datase geirhos2021partial at different parameter levels. Images within the same column share the same parameter rank within each distortion type.
  • Figure 2: A human-centered OOD spectrum. (A) Illustration of the process for measuring the degree of human performance deviation on a specific distortion condition relative to undistorted images. (B) OOD spectrum constructed from human accuracy (logit transformed). The x-axis represents the OOD score, indicating the degree of deviation from the performance baseline on undistorted images, with greater negative values corresponding to larger distributional shifts. Example images from selected distortions are shown above their corresponding positions on the axis. (C) OOD scores for all levels of corruption for each distortion type in modelvshuman dataset. (D) Four-component Gaussian Mixture Model (GMM) fitted to OOD scores across all distortion conditions, defining the four OOD groups. Color reflects different OOD regimes. Boundaries illustrated by dashed lines.
  • Figure 3: Human-human error alignment A. Dot plots for Error Consistency (EC; top), and Misclassification Agreement (MA; bottom) across distortion types and OOD levels. Each dot represents the mean error alignment value between each pair of participants for a specific level of a distortion, coloured by its OOD category, with vertical bars indicating ±1 standard deviation. Transparent violin plots show the distributions of the alignment scores across distortion levels within distortion type. B. t-SNE plot illustrating similarity of human-human error patterns across distortion domains based on the human-human class-level error divergence (CLED) matrix. Each point represents a distortion condition, labeled by distortion type (e.g., CT: contrast, HP: high-pass). Colors indicate OOD regime they belong to. Spatial proximity reflects similarity in human error patterns across conditions.
  • Figure 4: Model-human alignment across distortion levels. Accuracy (ACC, top row), Error Consistency (EC, middle row), and Misclassification Agreement (MA, bottom row) for humans and models across three representative distortion types: High-pass (left), Low-pass (middle), and Contrast (right). Each curve shows the mean: for humans, the mean across participant pairs (orange); for models, the mean across model–human pairs within a subfamily. Curve colours denote superfamilies (CNNs: purple, ViTs: yellow, VLMs: pink), while marker shapes distinguish subfamilies within each superfamily. Shaded regions indicate ±1 standard deviation. Curves are plotted against corruption parameters that control the severity, with background shading marking OOD regimes.
  • Figure 5: Radar plots of mean model–human error alignment for each superfamily across distortion types, separated by OOD regime (near-OOD, EC (A), MA (B); far-OOD, EC (C), MA (D)). Each axis represents a distortion type, and the radial coordinate indicates model–human alignment scores. Each curve shows the mean EC or MA value for a model superfamily, with error bars indicating ±1 standard deviation. Human–human alignment scores are denoted in orange.
  • ...and 7 more figures