Table of Contents
Fetching ...

Beyond Average: Individualized Visual Scanpath Prediction

Xianyu Chen, Ming Jiang, Qi Zhao

TL;DR

This work addresses the problem of inter-observer variability in visual attention by introducing individualized scanpath prediction (ISP). It proposes three novel components—an observer encoder, an observer-centric feature integration module, and an adaptive fixation prioritization mechanism—that jointly tailor scanpath predictions to each observer within existing encoder–decoder frameworks. Across four diverse eye-tracking datasets and multiple architectures, ISP consistently outperforms observer-agnostic models and fine-tuned baselines on value-based and ranking-based metrics, while also enabling population- and ASD-specific analyses. The results demonstrate both improved prediction accuracy and practical potential for observer-aware applications, such as personalized interfaces and ASD-related gaze analysis. Altogether, the paper advances personalized attention modeling by tightly integrating observer traits into the predictive process and validating its generalizability and utility.

Abstract

Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper focuses on individualized scanpath prediction (ISP), a new attention modeling task that aims to accurately predict how different individuals shift their attention in diverse visual tasks. It proposes an ISP method featuring three novel technical components: (1) an observer encoder to characterize and integrate an observer's unique attention traits, (2) an observer-centric feature integration approach that holistically combines visual features, task guidance, and observer-specific characteristics, and (3) an adaptive fixation prioritization mechanism that refines scanpath predictions by dynamically prioritizing semantic feature maps based on individual observers' attention traits. These novel components allow scanpath models to effectively address the attention variations across different observers. Our method is generally applicable to different datasets, model architectures, and visual tasks, offering a comprehensive tool for transforming general scanpath models into individualized ones. Comprehensive evaluations using value-based and ranking-based metrics verify the method's effectiveness and generalizability.

Beyond Average: Individualized Visual Scanpath Prediction

TL;DR

This work addresses the problem of inter-observer variability in visual attention by introducing individualized scanpath prediction (ISP). It proposes three novel components—an observer encoder, an observer-centric feature integration module, and an adaptive fixation prioritization mechanism—that jointly tailor scanpath predictions to each observer within existing encoder–decoder frameworks. Across four diverse eye-tracking datasets and multiple architectures, ISP consistently outperforms observer-agnostic models and fine-tuned baselines on value-based and ranking-based metrics, while also enabling population- and ASD-specific analyses. The results demonstrate both improved prediction accuracy and practical potential for observer-aware applications, such as personalized interfaces and ASD-related gaze analysis. Altogether, the paper advances personalized attention modeling by tightly integrating observer traits into the predictive process and validating its generalizability and utility.

Abstract

Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper focuses on individualized scanpath prediction (ISP), a new attention modeling task that aims to accurately predict how different individuals shift their attention in diverse visual tasks. It proposes an ISP method featuring three novel technical components: (1) an observer encoder to characterize and integrate an observer's unique attention traits, (2) an observer-centric feature integration approach that holistically combines visual features, task guidance, and observer-specific characteristics, and (3) an adaptive fixation prioritization mechanism that refines scanpath predictions by dynamically prioritizing semantic feature maps based on individual observers' attention traits. These novel components allow scanpath models to effectively address the attention variations across different observers. Our method is generally applicable to different datasets, model architectures, and visual tasks, offering a comprehensive tool for transforming general scanpath models into individualized ones. Comprehensive evaluations using value-based and ranking-based metrics verify the method's effectiveness and generalizability.
Paper Structure (29 sections, 9 equations, 10 figures, 11 tables)

This paper contains 29 sections, 9 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Understanding and predicting the distinct eye movements of each observer is the key objective of individualized scanpath prediction. These examples reveal the variations in the scanpaths of different observers, showing their distinct attention preferences in (a) faces, (b) objects, and (c) background. Each dot represents a fixation, with the number and radius indicating its order and duration, respectively. The blue and red dots indicate the beginning and the end of the scanpath, respectively.
  • Figure 2: Our proposed method incorporates an observer encoder for characterizing individualized attention traits, followed by observer-centric feature integration for holistic processing, and adaptive fixation prioritization for refined predictions.
  • Figure 3: Qualitative examples of scanpaths predicted by ChenLSTM-FT, ChenLSTM-ISP, and ground truth. Each row compares the model predictions and the ground truth scanpath of one observer. These observers show different gaze patterns, including (a) focusing on the image center, (b) exploring different people and objects, (c) exploring broadly in the scene, and (d) focusing on a particular region. The blue and red dots indicate the beginning and the end of the scanpath, respectively.
  • Figure 4: Saliency evaluation results of the baselines, fine-tuned (FT) models, and ISP models. Error bars indicate the standard error of the mean.
  • Figure 5: Statistical comparison between the predicted fixations for the ASD and Control groups shuo:2015:austim. Error bars indicate the standard error of the mean. Asterisks indicate significant differences (unpaired t-test, $p < 0.05$).
  • ...and 5 more figures