Table of Contents
Fetching ...

Probing Large Language Models from A Human Behavioral Perspective

Xintong Wang, Xiaoyu Li, Xingshan Li, Chris Biemann

TL;DR

The paper addresses how to interpret large language models by aligning their internal signals with human reading behavior using eye-tracking data from ZuCo 2.0. By analyzing GPT-2 base, it reveals that middle-layer FFN signals increasingly encode word semantics for token prediction, while MHSA signals show growing alignment with human attention across deeper layers without a decline. The findings show LLMs resemble human predictive patterns more closely than shallow models, with NR readings yielding stronger correlations than TSR, implying interpretability advantages for trustworthy model development. This human-behavioral probing approach offers a practical framework for diagnosing and refining LLMs using interpretable, cognitively grounded metrics.

Abstract

Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP. However, the understanding of their prediction processes and internal mechanisms, such as feed-forward networks (FFN) and multi-head self-attention (MHSA), remains largely unexplored. In this work, we probe LLMs from a human behavioral perspective, correlating values from LLMs with eye-tracking measures, which are widely recognized as meaningful indicators of human reading patterns. Our findings reveal that LLMs exhibit a similar prediction pattern with humans but distinct from that of Shallow Language Models (SLMs). Moreover, with the escalation of LLM layers from the middle layers, the correlation coefficients also increase in FFN and MHSA, indicating that the logits within FFN increasingly encapsulate word semantics suitable for predicting tokens from the vocabulary.

Probing Large Language Models from A Human Behavioral Perspective

TL;DR

The paper addresses how to interpret large language models by aligning their internal signals with human reading behavior using eye-tracking data from ZuCo 2.0. By analyzing GPT-2 base, it reveals that middle-layer FFN signals increasingly encode word semantics for token prediction, while MHSA signals show growing alignment with human attention across deeper layers without a decline. The findings show LLMs resemble human predictive patterns more closely than shallow models, with NR readings yielding stronger correlations than TSR, implying interpretability advantages for trustworthy model development. This human-behavioral probing approach offers a practical framework for diagnosing and refining LLMs using interpretable, cognitively grounded metrics.

Abstract

Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP. However, the understanding of their prediction processes and internal mechanisms, such as feed-forward networks (FFN) and multi-head self-attention (MHSA), remains largely unexplored. In this work, we probe LLMs from a human behavioral perspective, correlating values from LLMs with eye-tracking measures, which are widely recognized as meaningful indicators of human reading patterns. Our findings reveal that LLMs exhibit a similar prediction pattern with humans but distinct from that of Shallow Language Models (SLMs). Moreover, with the escalation of LLM layers from the middle layers, the correlation coefficients also increase in FFN and MHSA, indicating that the logits within FFN increasingly encapsulate word semantics suitable for predicting tokens from the vocabulary.
Paper Structure (13 sections, 2 equations, 3 figures, 2 tables)

This paper contains 13 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Comparison of Human reading pattern and transformer block. The left part shows the fixation patterns of a human reader over a given sentence, while the right part demonstrates a transformer block including FFN layers and multi-head self-attention. The blue dots mark fixations on the corresponding words above; a wider diameter represents a longer fixation duration.
  • Figure 2: FFN Correlation Values. FFN values through layers in GPT-2 base Correlated with five different eye-tracking features in three groups: bottom, middle, and upper. (Significant at $p \textless 0.05$)
  • Figure 3: Attention Heads Correlated Values with Eye-tracking Measurements through Layers Results.Lighter and larger values signify stronger correlations.