Table of Contents
Fetching ...

Visual Attention Graph

Kai-Fu Yang, Yong-Jie Li

TL;DR

This work introduces the Attention Graph (AG), a graph-based representation that jointly encodes visual saliency and semantic scanpaths by modeling objects as nodes and gaze-shift probabilities as directed edges. By defining semantic scanpaths (SemScan(obj) and SemScan(att)) and constructing an AG that captures the distribution of observers’ attention across a scene, the authors provide new metrics (ScoreGraph-based $S_{scan}$ and $S'_{scan}$) to evaluate semantic scanpath predictions. The approach reduces intra-observer variability, enables sampling of plausible semantic scanpaths, and yields competitive results on cognition-related tasks such as age classification and ASD screening without requiring extra feature learning. The AG framework promises a scalable, semantically grounded benchmark for attention modeling and offers potential for low-cost eye-tracking applications in real-world settings.

Abstract

Visual attention plays a critical role when our visual system executes active visual tasks by interacting with the physical scene. However, how to encode the visual object relationship in the psychological world of our brain deserves to be explored. In the field of computer vision, predicting visual fixations or scanpaths is a usual way to explore the visual attention and behaviors of human observers when viewing a scene. Most existing methods encode visual attention using individual fixations or scanpaths based on the raw gaze shift data collected from human observers. This may not capture the common attention pattern well, because without considering the semantic information of the viewed scene, raw gaze shift data alone contain high inter- and intra-observer variability. To address this issue, we propose a new attention representation, called Attention Graph, to simultaneously code the visual saliency and scanpath in a graph-based representation and better reveal the common attention behavior of human observers. In the attention graph, the semantic-based scanpath is defined by the path on the graph, while saliency of objects can be obtained by computing fixation density on each node. Systemic experiments demonstrate that the proposed attention graph combined with our new evaluation metrics provides a better benchmark for evaluating attention prediction methods. Meanwhile, extra experiments demonstrate the promising potentials of the proposed attention graph in assessing human cognitive states, such as autism spectrum disorder screening and age classification.

Visual Attention Graph

TL;DR

This work introduces the Attention Graph (AG), a graph-based representation that jointly encodes visual saliency and semantic scanpaths by modeling objects as nodes and gaze-shift probabilities as directed edges. By defining semantic scanpaths (SemScan(obj) and SemScan(att)) and constructing an AG that captures the distribution of observers’ attention across a scene, the authors provide new metrics (ScoreGraph-based and ) to evaluate semantic scanpath predictions. The approach reduces intra-observer variability, enables sampling of plausible semantic scanpaths, and yields competitive results on cognition-related tasks such as age classification and ASD screening without requiring extra feature learning. The AG framework promises a scalable, semantically grounded benchmark for attention modeling and offers potential for low-cost eye-tracking applications in real-world settings.

Abstract

Visual attention plays a critical role when our visual system executes active visual tasks by interacting with the physical scene. However, how to encode the visual object relationship in the psychological world of our brain deserves to be explored. In the field of computer vision, predicting visual fixations or scanpaths is a usual way to explore the visual attention and behaviors of human observers when viewing a scene. Most existing methods encode visual attention using individual fixations or scanpaths based on the raw gaze shift data collected from human observers. This may not capture the common attention pattern well, because without considering the semantic information of the viewed scene, raw gaze shift data alone contain high inter- and intra-observer variability. To address this issue, we propose a new attention representation, called Attention Graph, to simultaneously code the visual saliency and scanpath in a graph-based representation and better reveal the common attention behavior of human observers. In the attention graph, the semantic-based scanpath is defined by the path on the graph, while saliency of objects can be obtained by computing fixation density on each node. Systemic experiments demonstrate that the proposed attention graph combined with our new evaluation metrics provides a better benchmark for evaluating attention prediction methods. Meanwhile, extra experiments demonstrate the promising potentials of the proposed attention graph in assessing human cognitive states, such as autism spectrum disorder screening and age classification.

Paper Structure

This paper contains 17 sections, 2 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Examples show the high intra-observer variability of scanpaths that are from two different human observers when viewing the same scene. Noted that the data is provided by the OSIE dataset xu2014predicting.
  • Figure 2: Comparison between different attention-related representations. (a)-(b) illustration of the ideas of evaluating fixation prediction task, (c)-(d) illustrating the requirement of distribution of human scanpaths to evaluate scanpath prediction models, referring to the evaluation of fixation prediction task.
  • Figure 3: Examples for demonstrating the generative process of semantic scanpath from raw fixations of two observers, and the annotations and attributes of objects, which are provided by the OSIE dataset xu2014predicting. Specifically, the object masks are used to group raw fixations into SemScan(obj), while 12 semantic-level attributes (Smell, Touch, Watchability, etc.) provided in the OSIE dataset are used to further group SemScan(obj) into SemScan(att). It should be noted that one object with multiple attributes (e.g., Touch & Watchability in this figure) is treated as a new semantic attribute. Meanwhile, None indicates the objects without labeled attributes.
  • Figure 4: Statistical analysis of fixations located inner objects in the OSIE dataset.
  • Figure 5: Constructing the attention graph from multiple semantic scanpaths of observers. For example, the weights from object I to object B is $4/9$ in the attention graph, which indicates that 4 out of 9 attention shifts from object I to object B in all observers are counted. Note that the edge weights are shown as fractional number to indicate raw counts of attention shifts.
  • ...and 9 more figures