Table of Contents
Fetching ...

A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces

Matthew Kenely, Dylan Seychell, Carl James Debono, Chris Porter

TL;DR

Demographic representation in UI saliency research is often missing, limiting generalizability. The authors integrate DeepGaze IIE as the saliency backbone into the SaRa framework, refine grid-based scoring with an added pixel-value term, and apply a normalization pipeline to produce robust saliency maps. Two experiments—eye-tracking with $n=30$ participants and mouse-tracking with $n=375$ participants aged $13$–$70$—reveal age-based differences (older participants focus more on headings, younger on images) and show that mouse-tracking closely tracks eye-tracking signals with $\mathrm{sAUC} \approx 0.86$, enabling large-scale analysis; applying all optimizations yields a $SOR$ improvement of $10.7\%$ over the baseline. The work underscores the need for large, demographically representative datasets and explicit demographic reporting to ensure fair and generalizable UI saliency tools.

Abstract

News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models. Despite recent advancements in saliency detection applied to user interfaces (UI), existing datasets are limited in size and demographic representation. We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE, improving Salient Object Ranking (SOR) performance by 10.7%. Our framework optimizes three key components: saliency map generation, grid segment scoring, and map normalization. Through a two-fold experiment using eye-tracking (30 participants) and mouse-tracking (375 participants aged 13--70), we analyze attention patterns across demographic groups. Statistical analysis reveals significant age-based variations (p < 0.05, {ε^2} = 0.042), with older users (36--70) engaging more with textual content and younger users (13--35) interacting more with images. Mouse-tracking data closely approximates eye-tracking behavior (sAUC = 0.86) and identifies UI elements that immediately stand out, validating its use in large-scale studies. We conclude that saliency studies should prioritize gathering data from a larger, demographically representative sample and report exact demographic distributions.

A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces

TL;DR

Demographic representation in UI saliency research is often missing, limiting generalizability. The authors integrate DeepGaze IIE as the saliency backbone into the SaRa framework, refine grid-based scoring with an added pixel-value term, and apply a normalization pipeline to produce robust saliency maps. Two experiments—eye-tracking with participants and mouse-tracking with participants aged —reveal age-based differences (older participants focus more on headings, younger on images) and show that mouse-tracking closely tracks eye-tracking signals with , enabling large-scale analysis; applying all optimizations yields a improvement of over the baseline. The work underscores the need for large, demographically representative datasets and explicit demographic reporting to ensure fair and generalizable UI saliency tools.

Abstract

News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models. Despite recent advancements in saliency detection applied to user interfaces (UI), existing datasets are limited in size and demographic representation. We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE, improving Salient Object Ranking (SOR) performance by 10.7%. Our framework optimizes three key components: saliency map generation, grid segment scoring, and map normalization. Through a two-fold experiment using eye-tracking (30 participants) and mouse-tracking (375 participants aged 13--70), we analyze attention patterns across demographic groups. Statistical analysis reveals significant age-based variations (p < 0.05, {ε^2} = 0.042), with older users (36--70) engaging more with textual content and younger users (13--35) interacting more with images. Mouse-tracking data closely approximates eye-tracking behavior (sAUC = 0.86) and identifies UI elements that immediately stand out, validating its use in large-scale studies. We conclude that saliency studies should prioritize gathering data from a larger, demographically representative sample and report exact demographic distributions.

Paper Structure

This paper contains 27 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: From left to right: Original image, saliency map from DeepGaze IIE, and map after $31\times31$ Gaussian filter and bit depth normalization.
  • Figure 2: Age distribution in the mouse-tracking experiment, binned into 4 groups.
  • Figure 3: Responses to "Which type of element do you feel stood out the most?" from the control group (blue) and the experimental group (orange).
  • Figure 4: Gaze location results for the interface "The Shift" shown to the experimental group. Left: average fixation location per second in the eye-tracking experiment, right: heatmap from the mouse-tracking experiment.
  • Figure 5: Custom (MOBILE) interface heatmaps. Control group on the left, experimental group at the right. From left to right for each group: heatmaps from the eye-tracking experiment, heatmaps from the mouse-tracking experiment, saliency maps generated by DeepGaze IIE and the corresponding SaRa ranks.
  • ...and 1 more figures