Table of Contents
Fetching ...

Hierarchical Resource Rationality Explains Human Reading Behavior

Yunpeng Bai, Xiaofu Jin, Shengdong Zhao, Antti Oulasvirta

Abstract

Reading is a pervasive and cognitively demanding activity that underpins modern human culture. It is a prime instance of a class of tasks where eye movements are coordinated for the purpose of comprehension. Existing theories explain either eye movements or comprehension during reading, but the critical link between the two remains unclear. Here, we propose resource-rational optimization as a unifying principle governing adaptive reading behavior. Eye movements are selected to maximize expected comprehension while minimizing cognitive and temporal costs, organized hierarchically across nested time scales: fixation decisions support word recognition; sentence-level integration guides skipping and regression; and text-level comprehension goals shape memory construction and rereading. A computational implementation successfully replicates an unprecedented range of findings in human reading, from lexical effects to comprehension outcomes. Together, these results suggest that resource rationality provides a general mechanism for coordinating perception, memory, and action in knowledge-intensive human behaviors, offering a principled account of how complex cognitive skills adapt to limited resources.

Hierarchical Resource Rationality Explains Human Reading Behavior

Abstract

Reading is a pervasive and cognitively demanding activity that underpins modern human culture. It is a prime instance of a class of tasks where eye movements are coordinated for the purpose of comprehension. Existing theories explain either eye movements or comprehension during reading, but the critical link between the two remains unclear. Here, we propose resource-rational optimization as a unifying principle governing adaptive reading behavior. Eye movements are selected to maximize expected comprehension while minimizing cognitive and temporal costs, organized hierarchically across nested time scales: fixation decisions support word recognition; sentence-level integration guides skipping and regression; and text-level comprehension goals shape memory construction and rereading. A computational implementation successfully replicates an unprecedented range of findings in human reading, from lexical effects to comprehension outcomes. Together, these results suggest that resource rationality provides a general mechanism for coordinating perception, memory, and action in knowledge-intensive human behaviors, offering a principled account of how complex cognitive skills adapt to limited resources.
Paper Structure (4 sections, 2 equations, 10 figures, 1 table)

This paper contains 4 sections, 2 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: A resource-rational mechanism for reading. The reader is simulated as a resource-rational lieder2020resourcegershman2015computational agent that maximizes text comprehension (goal) by optimizing fixations on a text display (external environment), under constraints of limited resources such as memory, visual perception, and time. The agent is uncertain about the true states because its observations are partial. To handle such uncertainties, it maintains probabilistic beliefs over the word, sentence, and text states through three memory stores -- Lexical, Short-term, and Long-term. Decisions are made hierarchically, high-level text comprehension guide sentence-level readings, which in turns guides word recognition and eye-movement control.
  • Figure 2: Hierarchical resource-rational control of reading formalized as POMDPs. Eye-movement control is organized into three nested levels: Text, Sentence, and Word, each operating on its own temporal scale. Higher-level controllers set coarse reading targets, and lower-level controllers execute them at finer scales. For instance, the text-level controller chooses the sentence, the sentence-level chooses the word within that sentence, and the word-level recognize that word by deciding how to sample the letters. At each level, the process is formalized as a POMDP. The state $S_N$ describes the reading dynamics; observations $O_N$ reflect noisy visual input sampled from the environment (text stimulus) due to limited visual attention, forming partial observability from $S_N$. Because the true state $S_N$ is not directly observable to the memory, the agent maintains a belief state $B_N$ based on $O_N$ in the memory to handle uncertainties: encoding, for example, coherence appraisals across sentences, word predictions, or word activation distributions. Actions $A_N$ are selected based on $B_N$ to advance reading, which in turn updates the state $S_N$ and produce new observations $O_N$ for the next step. $S_F$ represents the final step at given level. Decision-makings are resource-rational: at each level, the agent learns to select the action that maximizes expected reward, defined as comprehension utility $U$ and eye-movement costs $C_N$.
  • Figure 3: Model reproduces key empirical effects in eye movements and comprehension.a. Word-level. Both humans and the simulation show lexical influences on gaze duration: shorter, more frequent, and more predictable words receive shorter gaze duration. b. Sentence-level. Skips and regressions. Short, frequent, and predictable words are skipped more often, whereas lexically or contextually difficult words elicit more regressions. c. Text-level. Comprehension-driven control. The simulation reproduces targeted regressions to poorly understood passages (low initial appraisal) and captures how prior knowledge and text coherence improve recall. d. Reading under time pressure. Adaptive strategy shifts. Increasing time pressure leads to faster reading with more skips and fewer regressions, accompanied by reduced multiple-choice accuracy and free recall. Human data are shown in blue and model predictions in green; dashed lines indicate linear fits with 95% confidence intervals, and bars show means with standard deviations. Together, these results demonstrate reading as a resource-rational process: readers adapt their eye movements to linguistic features and available time, producing corresponding adaptations in comprehension. These adaptations reflect a systematic trade-off between accuracy and effort to maximize overall utility across word-, sentence-, and text-level processing.
  • Figure 4: Reading behavior under different time constraints (30 s, 60 s, 90 s).a. Heatmaps. Both humans and the simulation broaden the coverage of fixations and allocate more fixation time as available reading time increases. b. Scanpaths. Eye-movement sequences adapt to time pressure: limited time causes faster reading with more skips (blue) and fewer regressions (green), whereas abundant time allows more careful inspection and targeted regressions. Red dots and lines represent normal (first-pass) fixation points and saccades, respectively. Animated visualization examples of these scanpaths are provided in Supplementary Videos. c. Recall. Free-recall responses reflect the same trade-off: shorter durations yield coarse recall, whereas longer durations support more complete and detailed memory. Together, these patterns illustrate reading as a resource-rational control process that flexibly adapts to time resources: when time is limited, readers optimize overall comprehension by prioritizing coverage over detail -- skipping more and regressing less; when time is abundant, they shift toward higher-accuracy comprehension by investing additional fixations and regressions.
  • Figure 5: Effects of memory limits, hierarchical integrity, and long-term comprehension utility.a-c. Eye movements (reading speed, skip rate, regression rate) and d-e. comprehension (MCQ accuracy, free recall) under three time limits (30 s, 60 s, 90 s). Human data are blue; our original model's simulation is green; modified model's simulations are grey with green hatches. Bars show means; error bars denote standard deviation and are truncated to the feasible range $[0,1]$ for bounded measures. The unlimited-memory model achieves markedly super-human MCQ and free-recall scores while showing fewer regressions and faster reading. The myopic sentence reader ($\gamma=0.2, 0.6$) collapses into local behavior, repeatedly fixating early words and failing to advance through the text, producing near-zero comprehension. The myopic text reader ($\gamma=0.2$) preserves lower-level word and sentence processing but fails to proceed reading over sentences, yielding markedly reduced comprehension; a milder myopia ($\gamma=0.6$) improves performance but still falls far below humans. In contrast, the full agent, combining bounded memory with non-myopic optimization of future comprehension most closely matches human eye movements and comprehension. These results show that human-like reading requires realistic resource limits, sensitivity to long-term comprehension utility, and an intact hierarchical control structure.
  • ...and 5 more figures