Table of Contents
Fetching ...

Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

Henrik Krauss, Takehisa Yairi

Abstract

We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.

Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

Abstract

We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.

Paper Structure

This paper contains 11 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure A1: The controlled ablation framework performed in this study: Human gameplay and eye-tracking data from the Atari-HEAD dataset zhang2020atari are used to identify focus and periphery region, and construct gaze maps and past states. These information sources are included and excluded for the training of several action prediction networks to estimate their respective contributions.
  • Figure B1: Architecture of the human action prediction network with three options of including (I) peripheral information, (II) gaze information, and (III) past-state information.
  • Figure C1: Validation action-prediction accuracies across games (left) and median relative performance drops with respect to model A, normalized by the A-common gap (right).
  • Figure C2: Cluster composition across games (upper), mean true-action-probability profiles (middle), and namings (lower) for the five clusters.
  • Figure C3: Mean silhouette scores per game and cluster, with an additional overall column.
  • ...and 3 more figures