Table of Contents
Fetching ...

Signatures of human-like processing in Transformer forward passes

Jennifer Hu, Michael A. Lepori, Michael Franke

TL;DR

This work investigates whether Transformer forward passes exhibit human-like real-time processing patterns by analyzing layer-time dynamics across 20 open-source models and 6 domains. Using a logit-lens framework and novel processing metrics, the authors identify competitor-interference signatures and demonstrate that dynamic forward-pass measures better predict human processing than static final-layer statistics, with mid-sized models often most human-like. The study shows these dynamics generalize across text and vision tasks, suggesting forward passes can serve as explicit models of human cognition and informing AI design for alignment with human processing. Overall, the results establish a proof-of-concept that mechanistic analyses of AI models can illuminate aspects of human cognition and processing difficulty across diverse tasks.

Abstract

Modern AI models are increasingly being used as theoretical tools to study human cognition. One dominant approach is to evaluate whether human-derived measures are predicted by a model's output: that is, the end-product of a forward pass. However, recent advances in mechanistic interpretability have begun to reveal the internal processes that give rise to model outputs, raising the question of whether models might use human-like processing strategies. Here, we investigate the relationship between real-time processing in humans and layer-time dynamics of computation in Transformers, testing 20 open-source models in 6 domains. We first explore whether forward passes show mechanistic signatures of competitor interference, taking high-level inspiration from cognitive theories. We find that models indeed appear to initially favor a competing incorrect answer in the cases where we would expect decision conflict in humans. We then systematically test whether forward-pass dynamics predict signatures of processing in humans, above and beyond properties of the model's output probability distribution. We find that dynamic measures improve prediction of human processing measures relative to static final-layer measures. Moreover, across our experiments, larger models do not always show more human-like processing patterns. Our work suggests a new way of using AI models to study human cognition: not just as a black box mapping stimuli to responses, but potentially also as explicit processing models.

Signatures of human-like processing in Transformer forward passes

TL;DR

This work investigates whether Transformer forward passes exhibit human-like real-time processing patterns by analyzing layer-time dynamics across 20 open-source models and 6 domains. Using a logit-lens framework and novel processing metrics, the authors identify competitor-interference signatures and demonstrate that dynamic forward-pass measures better predict human processing than static final-layer statistics, with mid-sized models often most human-like. The study shows these dynamics generalize across text and vision tasks, suggesting forward passes can serve as explicit models of human cognition and informing AI design for alignment with human processing. Overall, the results establish a proof-of-concept that mechanistic analyses of AI models can illuminate aspects of human cognition and processing difficulty across diverse tasks.

Abstract

Modern AI models are increasingly being used as theoretical tools to study human cognition. One dominant approach is to evaluate whether human-derived measures are predicted by a model's output: that is, the end-product of a forward pass. However, recent advances in mechanistic interpretability have begun to reveal the internal processes that give rise to model outputs, raising the question of whether models might use human-like processing strategies. Here, we investigate the relationship between real-time processing in humans and layer-time dynamics of computation in Transformers, testing 20 open-source models in 6 domains. We first explore whether forward passes show mechanistic signatures of competitor interference, taking high-level inspiration from cognitive theories. We find that models indeed appear to initially favor a competing incorrect answer in the cases where we would expect decision conflict in humans. We then systematically test whether forward-pass dynamics predict signatures of processing in humans, above and beyond properties of the model's output probability distribution. We find that dynamic measures improve prediction of human processing measures relative to static final-layer measures. Moreover, across our experiments, larger models do not always show more human-like processing patterns. Our work suggests a new way of using AI models to study human cognition: not just as a black box mapping stimuli to responses, but potentially also as explicit processing models.

Paper Structure

This paper contains 47 sections, 11 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of our study. (a) Experiment 1: We explore whether forward passes show mechanistic signatures of competitor interference, first preferring a salient competing intuitive answer before preferring the correct answer. (b) Experiment 2: We systematically investigate the ability of dynamic measures derived from forward passes to predict indicators of processing load in humans.
  • Figure 2: Experiment 1 results. (a) LMs generally show stronger signs of two-stage processing for the items with competing intuitive answers. Asterisks denote sig. $t$-tests comparing means across conditions within each domain. (b) $\Delta\textsc{LogProb}$ across layers for sample LMs in the capitals recall domain, illustrating different processing strategies. (c) Two-stage processing interacts with size.
  • Figure 3: Experiment 2 results for text domains. (a) Top: $R^2$ achieved by model processing measures (x-axis) across groups of human DVs (hue). Bottom: Log Bayes Factor comparing critical to baseline regression models. Horizontal line = $\log(3)$. (c) Mean $R^2$ across bins of model sizes.
  • Figure 4: Illustration of human tasks analyzed in Experiment 2. (a) Recall (free response) of capital cities. (b) Recognition (forced-choice) of capital cities. (c) Categorization of typical and atypical animal exemplars via mouse movement kieslich_design_2020. (d) Judgment of logical validity of syllogistic arguments lampinen_language_2024. (e) Object recognition of out-of-distribution images geirhos_partial_2021.
  • Figure 5: Distribution of labels assigned by GPT-4o to free responses in capitals recall human experiment.
  • ...and 6 more figures