A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces

Matthew Kenely; Dylan Seychell; Carl James Debono; Chris Porter

A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces

Matthew Kenely, Dylan Seychell, Carl James Debono, Chris Porter

TL;DR

Demographic representation in UI saliency research is often missing, limiting generalizability. The authors integrate DeepGaze IIE as the saliency backbone into the SaRa framework, refine grid-based scoring with an added pixel-value term, and apply a normalization pipeline to produce robust saliency maps. Two experiments—eye-tracking with $n=30$ participants and mouse-tracking with $n=375$ participants aged $13$–$70$—reveal age-based differences (older participants focus more on headings, younger on images) and show that mouse-tracking closely tracks eye-tracking signals with $\mathrm{sAUC} \approx 0.86$, enabling large-scale analysis; applying all optimizations yields a $SOR$ improvement of $10.7\%$ over the baseline. The work underscores the need for large, demographically representative datasets and explicit demographic reporting to ensure fair and generalizable UI saliency tools.

Abstract

News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models. Despite recent advancements in saliency detection applied to user interfaces (UI), existing datasets are limited in size and demographic representation. We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE, improving Salient Object Ranking (SOR) performance by 10.7%. Our framework optimizes three key components: saliency map generation, grid segment scoring, and map normalization. Through a two-fold experiment using eye-tracking (30 participants) and mouse-tracking (375 participants aged 13--70), we analyze attention patterns across demographic groups. Statistical analysis reveals significant age-based variations (p < 0.05, {ε^2} = 0.042), with older users (36--70) engaging more with textual content and younger users (13--35) interacting more with images. Mouse-tracking data closely approximates eye-tracking behavior (sAUC = 0.86) and identifies UI elements that immediately stand out, validating its use in large-scale studies. We conclude that saliency studies should prioritize gathering data from a larger, demographically representative sample and report exact demographic distributions.

A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces

TL;DR

Abstract

A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)