Table of Contents
Fetching ...

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

Alessandro D'Amelio, Giuseppe Cartella, Vittorio Cuculo, Manuele Lucchi, Marcella Cornia, Rita Cucchiara, Giuseppe Boccignone

TL;DR

TPP-Gaze is presented, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that Jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory.

Abstract

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer's visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches. Source code and trained models are publicly available at: https://github.com/phuselab/tppgaze.

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

TL;DR

TPP-Gaze is presented, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that Jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory.

Abstract

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer's visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches. Source code and trained models are publicly available at: https://github.com/phuselab/tppgaze.

Paper Structure

This paper contains 15 sections, 11 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: Scanpath dynamics as a marked TPP. Time is represented on the horizontal axis, and different scanpath fixations occurs at time $t_1, t_2, t_3$ and $t_4$.
  • Figure 2: Overview of TPP-Gaze model architecture. Given a semantic representation of the image ($z_j$) and the history of past events ($h_n$), the next fixation position and duration are simulated.
  • Figure 3: Comparison of simulated and human scanpaths. Each circle represents a fixation point, with its diameter proportional to the fixation duration. For methods that do not model fixation duration, circles are shown with a uniform size.
  • Figure 4: Statistical properties exhibited by TPP-Gaze and other methods relative to those of human observers, in terms of empirical fixation durations and saccade amplitudes on the COCO-FreeView (top row) and OSIE (bottom row) datasets.
  • Figure 5: Return fixations analysis comparing TPP-Gaze with other methods and human observers. Results are shown on COCO-FreeView (left plot) and OSIE (right plot) datasets.
  • ...and 16 more figures