Table of Contents
Fetching ...

A Resource-Rational Principle for Modeling Visual Attention Control

Yunpeng Bai

TL;DR

This dissertation develops a resource-rational, simulation-based framework for modeling visual attention as a sequential decision-making process under perceptual, memory, and time constraints, formalizing visual tasks, such as reading and multitasking, as bounded-optimal control problems using Partially Observable Markov Decision Processes.

Abstract

Understanding how people allocate visual attention is central to Human-Computer Interaction (HCI), yet existing computational models of attention are often either descriptive, task-specific, or difficult to interpret. My dissertation develops a resource-rational, simulation-based framework for modeling visual attention as a sequential decision-making process under perceptual, memory, and time constraints. I formalize visual tasks, such as reading and multitasking, as bounded-optimal control problems using Partially Observable Markov Decision Processes, enabling eye-movement behaviors such as fixation and attention switching to emerge from rational adaptation rather than being hand-coded or purely data-driven. These models are instantiated in simulation environments spanning traditional text reading and reading-while-walking with smart glasses, where they reproduce classic empirical effects, explain observed trade-offs between comprehension and safety, and generate novel predictions under time pressure and interface variation. Collectively, this work contributes a unified computational account of visual attention, offering new tools for theory-driven and resource-efficient HCI design.

A Resource-Rational Principle for Modeling Visual Attention Control

TL;DR

This dissertation develops a resource-rational, simulation-based framework for modeling visual attention as a sequential decision-making process under perceptual, memory, and time constraints, formalizing visual tasks, such as reading and multitasking, as bounded-optimal control problems using Partially Observable Markov Decision Processes.

Abstract

Understanding how people allocate visual attention is central to Human-Computer Interaction (HCI), yet existing computational models of attention are often either descriptive, task-specific, or difficult to interpret. My dissertation develops a resource-rational, simulation-based framework for modeling visual attention as a sequential decision-making process under perceptual, memory, and time constraints. I formalize visual tasks, such as reading and multitasking, as bounded-optimal control problems using Partially Observable Markov Decision Processes, enabling eye-movement behaviors such as fixation and attention switching to emerge from rational adaptation rather than being hand-coded or purely data-driven. These models are instantiated in simulation environments spanning traditional text reading and reading-while-walking with smart glasses, where they reproduce classic empirical effects, explain observed trade-offs between comprehension and safety, and generate novel predictions under time pressure and interface variation. Collectively, this work contributes a unified computational account of visual attention, offering new tools for theory-driven and resource-efficient HCI design.
Paper Structure (4 sections, 3 figures)

This paper contains 4 sections, 3 figures.

Figures (3)

  • Figure 1: The Heads-Up Multitasker: a hierarchical, resource-rational model of attention allocation during reading while walking. The model enables a simulated agent to coordinate reading on optical head-mounted displays (OHMDs) with safe locomotion by allocating visual attention under competing task demands. The top panel illustrates the hierarchical reinforcement learning architecture, which decomposes attention control into supervisory, task, and motor levels, allowing interpretable trade-offs between reading and safety to emerge from bounded-optimal control. The bottom panels show the simulated multitasking scenario in the external environment: (a) a third-person view of the agent walking in an environment containing hazards; (b) a first-person view in which the agent reads text presented on OHMDs; and (c) a first-person view in which the agent attends to environmental signage to support safe navigation.
  • Figure 2: Architecture of the resource-rational reading model. The model represents a reader as a sequential decision-making agent that allocates visual attention to maximize text comprehension under constraints of limited memory, visual acuity, and time. Because observations of the text are partial and noisy, the agent maintains probabilistic beliefs at three representational levels: lexical, sentence, and text, supported by corresponding memory systems (lexical store, short-term memory, and long-term gist memory). Decisions are organized hierarchically: a text-level controller guides sentence selection, sentence-level control determines word-level priorities, and word-level control governs eye-movement decisions over visible letters. Rewards combine comprehension utility with resource costs, allowing fixation patterns, skipping, and regressions to emerge as bounded-optimal strategies. This architecture links eye-movement control with memory-driven comprehension, providing an interpretable computational account of human reading behavior relevant to predictive and simulation-based HCI.
  • Figure 3: Reading behavior under different time constraints (30 s, 60 s, 90 s). The model and human readers adapt their visual attention strategies as available reading time changes, reflecting resource-rational control of comprehension. (a) Heatmaps. With increasing time (from left to right), both humans and the simulation allocate attention more broadly across the text and invest more fixation time, whereas under tight deadlines attention is concentrated on high-coverage regions. (b) Recall. Free-recall responses exhibit the same trade-off, with longer reading time yielding richer gist-level recall and more complete and detailed memory.