Table of Contents
Fetching ...

CR-Eyes: A Computational Rational Model of Visual Sampling Behavior in Atari Games

Martin Lorenz, Niko Konzack, Alexander Lingler, Philipp Wintersberger, Patrick Ebel

Abstract

Designing mobile and interactive technologies requires understanding how users sample dynamic environments to acquire information and make decisions under time pressure. However, existing computational user models either rely on hand-crafted task representations or are limited to static or non-interactive visual inputs, restricting their applicability to realistic, pixel-based environments. We present CR-Eyes, a computationally rational model that simulates visual sampling and gameplay behavior in Atari games. Trained via reinforcement learning, CR-Eyes operates under perceptual and cognitive constraints and jointly learns where to look and how to act in a time-sensitive setting. By explicitly closing the perception-action loop, the model treats eye movements as goal-directed actions rather than as isolated saliency predictions. Our evaluation shows strong alignment with human data in task performance and aggregate saliency patterns, while also revealing systematic differences in scanpaths. CR-Eyes is a step toward scalable, theory-grounded user models that support design and evaluation of interactive systems.

CR-Eyes: A Computational Rational Model of Visual Sampling Behavior in Atari Games

Abstract

Designing mobile and interactive technologies requires understanding how users sample dynamic environments to acquire information and make decisions under time pressure. However, existing computational user models either rely on hand-crafted task representations or are limited to static or non-interactive visual inputs, restricting their applicability to realistic, pixel-based environments. We present CR-Eyes, a computationally rational model that simulates visual sampling and gameplay behavior in Atari games. Trained via reinforcement learning, CR-Eyes operates under perceptual and cognitive constraints and jointly learns where to look and how to act in a time-sensitive setting. By explicitly closing the perception-action loop, the model treats eye movements as goal-directed actions rather than as isolated saliency predictions. Our evaluation shows strong alignment with human data in task performance and aggregate saliency patterns, while also revealing systematic differences in scanpaths. CR-Eyes is a step toward scalable, theory-grounded user models that support design and evaluation of interactive systems.

Paper Structure

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: An overview over the CR-Eyes Architecture. Through the internal environment the agent performs a motor action (game action) and a sensory action (position to look at) at every timestep $t$ in the external environment. The resulting observed patch is returned to the internal environment and integrated into the memory, consisting of the last $n$ stacked observations. The memory is given to the RL agent as observation on which basis the agent will output the next motor and sensory action.
  • Figure 2: Durations how long frames were displayed in ms for humans (orange) zhangAtariHEADAtariHuman2019 and our agent (green). The agent was trained for three million training steps. The long tail of the distribution is summed up in the last bin and the y-axis is logarithmically scaled.
  • Figure 3: Agent and human saliency throughout an entire Seaquest episode. Human data is taken from Atari-HEAD. Subfigure c) shows saliency for the human episode truncated to the length of the agent's episode.
  • Figure 4: Resulting scanpath for Asterix. The green squares are the fovea position at step $t_i$. The red squares represent the agent's sensory action, that will be executed in the next step. At $t_3$ the game is paused until $t_5$.