Trends, Applications, and Challenges in Human Attention Modelling

Giuseppe Cartella; Marcella Cornia; Vittorio Cuculo; Alessandro D'Amelio; Dario Zanca; Giuseppe Boccignone; Rita Cucchiara

Trends, Applications, and Challenges in Human Attention Modelling

Giuseppe Cartella, Marcella Cornia, Vittorio Cuculo, Alessandro D'Amelio, Dario Zanca, Giuseppe Boccignone, Rita Cucchiara

TL;DR

The paper surveys how human attention modelling—via saliency maps and scanpaths—can guide deep learning across image/video processing, vision-and-language tasks, and language modelling. It provides a taxonomy of modelling approaches and applications, complemented by benchmarks, datasets, and evaluation metrics (e.g., AUC, NSS, string-edit distance). Key themes include integrating gaze data to improve object recognition, captioning, VQA, and reading comprehension, as well as domain-specific applications in robotics, autonomous driving, and medicine. Open challenges such as data scarcity, privacy concerns, and the need for synthetic gaze data and real-time multimodal gaze modelling are discussed, with recommendations for privacy-aware data collection and scalable gaze generation to advance human-AI interaction.

Abstract

Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling. This survey offers a reasoned overview of recent efforts to integrate human attention mechanisms into contemporary deep learning models and discusses future research directions and challenges. For a comprehensive overview on the ongoing research refer to our dedicated repository available at https://github.com/aimagelab/awesome-human-visual-attention.

Trends, Applications, and Challenges in Human Attention Modelling

TL;DR

Abstract

Paper Structure (10 sections, 1 figure, 1 table)

This paper contains 10 sections, 1 figure, 1 table.

Introduction
Human Attention Modelling
Saliency Prediction
Scanpath Prediction
Integrating Human Attention in AI Models
Image and Video Processing
Vision-and-Language Applications
Language Modelling
Domain-Specific Applications
Open Challenges and Future Directions

Figures (1)

Figure 1: An overview of sample architectures integrating human visual attention with different input and output modalities. Human visual attention has been employed to solve tasks in diverse domains spanning from image and video processing, automatic captioning, visual question answering, and language understanding, as well as robotics, autonomous driving, and medicine.

Trends, Applications, and Challenges in Human Attention Modelling

TL;DR

Abstract

Trends, Applications, and Challenges in Human Attention Modelling

Authors

TL;DR

Abstract

Table of Contents

Figures (1)