Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen, Ming Jiang, Qi Zhao
TL;DR
This work addresses the problem of inter-observer variability in visual attention by introducing individualized scanpath prediction (ISP). It proposes three novel components—an observer encoder, an observer-centric feature integration module, and an adaptive fixation prioritization mechanism—that jointly tailor scanpath predictions to each observer within existing encoder–decoder frameworks. Across four diverse eye-tracking datasets and multiple architectures, ISP consistently outperforms observer-agnostic models and fine-tuned baselines on value-based and ranking-based metrics, while also enabling population- and ASD-specific analyses. The results demonstrate both improved prediction accuracy and practical potential for observer-aware applications, such as personalized interfaces and ASD-related gaze analysis. Altogether, the paper advances personalized attention modeling by tightly integrating observer traits into the predictive process and validating its generalizability and utility.
Abstract
Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper focuses on individualized scanpath prediction (ISP), a new attention modeling task that aims to accurately predict how different individuals shift their attention in diverse visual tasks. It proposes an ISP method featuring three novel technical components: (1) an observer encoder to characterize and integrate an observer's unique attention traits, (2) an observer-centric feature integration approach that holistically combines visual features, task guidance, and observer-specific characteristics, and (3) an adaptive fixation prioritization mechanism that refines scanpath predictions by dynamically prioritizing semantic feature maps based on individual observers' attention traits. These novel components allow scanpath models to effectively address the attention variations across different observers. Our method is generally applicable to different datasets, model architectures, and visual tasks, offering a comprehensive tool for transforming general scanpath models into individualized ones. Comprehensive evaluations using value-based and ranking-based metrics verify the method's effectiveness and generalizability.
