Table of Contents
Fetching ...

Chartist: Task-driven Eye Movement Control for Chart Reading

Danqing Shi, Yao Wang, Yunpeng Bai, Andreas Bulling, Antti Oulasvirta

TL;DR

Chartist tackles the challenge of predicting how people read charts under specific analytical tasks by introducing a two-level hierarchical model that combines a memory-rich, LLM-powered cognitive controller with reinforcement-learning driven oculomotor subsystems. By formulating the task as a bounded $POMDP$ and training low-level gaze policies via $PPO$, Chartist can generate task-driven fixation sequences without using human eye-tracking data for training. The approach yields human-like scanpaths across RV, F, and FE tasks and outperforms several baselines on task-specific metrics, while also matching key human statistical patterns. The work enables applications in visualization design evaluation and optimization, as well as explainable AI for chart question answering, with potential extensions to diverse chart types and more sophisticated QA tasks, albeit with limitations in generalizability and spatial precision.

Abstract

To design data visualizations that are easy to comprehend, we need to understand how people with different interests read them. Computational models of predicting scanpaths on charts could complement empirical studies by offering estimates of user performance inexpensively; however, previous models have been limited to gaze patterns and overlooked the effects of tasks. Here, we contribute Chartist, a computational model that simulates how users move their eyes to extract information from the chart in order to perform analysis tasks, including value retrieval, filtering, and finding extremes. The novel contribution lies in a two-level hierarchical control architecture. At the high level, the model uses LLMs to comprehend the information gained so far and applies this representation to select a goal for the lower-level controllers, which, in turn, move the eyes in accordance with a sampling policy learned via reinforcement learning. The model is capable of predicting human-like task-driven scanpaths across various tasks. It can be applied in fields such as explainable AI, visualization design evaluation, and optimization. While it displays limitations in terms of generalizability and accuracy, it takes modeling in a promising direction, toward understanding human behaviors in interacting with charts.

Chartist: Task-driven Eye Movement Control for Chart Reading

TL;DR

Chartist tackles the challenge of predicting how people read charts under specific analytical tasks by introducing a two-level hierarchical model that combines a memory-rich, LLM-powered cognitive controller with reinforcement-learning driven oculomotor subsystems. By formulating the task as a bounded and training low-level gaze policies via , Chartist can generate task-driven fixation sequences without using human eye-tracking data for training. The approach yields human-like scanpaths across RV, F, and FE tasks and outperforms several baselines on task-specific metrics, while also matching key human statistical patterns. The work enables applications in visualization design evaluation and optimization, as well as explainable AI for chart question answering, with potential extensions to diverse chart types and more sophisticated QA tasks, albeit with limitations in generalizability and spatial precision.

Abstract

To design data visualizations that are easy to comprehend, we need to understand how people with different interests read them. Computational models of predicting scanpaths on charts could complement empirical studies by offering estimates of user performance inexpensively; however, previous models have been limited to gaze patterns and overlooked the effects of tasks. Here, we contribute Chartist, a computational model that simulates how users move their eyes to extract information from the chart in order to perform analysis tasks, including value retrieval, filtering, and finding extremes. The novel contribution lies in a two-level hierarchical control architecture. At the high level, the model uses LLMs to comprehend the information gained so far and applies this representation to select a goal for the lower-level controllers, which, in turn, move the eyes in accordance with a sampling policy learned via reinforcement learning. The model is capable of predicting human-like task-driven scanpaths across various tasks. It can be applied in fields such as explainable AI, visualization design evaluation, and optimization. While it displays limitations in terms of generalizability and accuracy, it takes modeling in a promising direction, toward understanding human behaviors in interacting with charts.

Paper Structure

This paper contains 26 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: We present Chartist, a computational model that can predict task-driven human scanpaths on charts. The figure demonstrates three analytical tasks involved in the study: retrieve value, filter, and find extreme. The visualization illustrates how models' predictions vary across tasks and match the pattern of human scanpaths, with fixation density maps overlaid.
  • Figure 2: The figure illustrates the concept of the model for task-driven eye movement control. When given a task, the agent makes decisions about the next subtask, based on information gathered from observing the chart stored in its memory. Each subtask controls eye movements at pixel level and retrieves information from the foveal vision area of the gaze.
  • Figure 3: An overview of the hierarchical eye-movement control architecture. When presented with a chart and a task, a cognitive controller, powered by large language models, makes decisions on what to look at next and judges whether it is confident enough to provide an answer to the task's question. It relies on internal memory, which summarizes the information gathered from the chart through eye movements. Once cognitive control has determined the next action, the oculomotor controller is responsible for moving the gaze and observing the chart through a limited vision field. The model's objective is to accurately address the task as quickly as possible within set cognitive and physical constraints.
  • Figure 4: The figure gives examples of how the internal memory helps the cognitive controller to remember what has been read and then select actions for detailed gaze movement. A green box indicates the information held in memory, a red box represents the action selected by cognitive control, and the blue lines in the images reflect the eye movement scanpaths.
  • Figure 5: An overview of the training workflow: 1) chart collection and labeling, wherein diverse real-world and synthetic charts are gathered, involving manual and automatic annotation of AOIs; 2) task generation, utilizing a rule-based approach to create tasks based on labeled charts to construct a data collection for training; 3) policy training, in which policy models are trained via RL from chart images with tasks; and 4) scanpath prediction, wherein pre-trained LLMs and RL policies are coordinated hierarchically to predict task-driven gaze movements over charts.
  • ...and 2 more figures