Table of Contents
Fetching ...

Foraging with the Eyes: Dynamics in Human Visual Gaze and Deep Predictive Modeling

Tejaswi V. Panchagnula

TL;DR

This study trained a convolutional neural network to predict fixation heatmaps from image input alone and accurately reproduced salient fixation regions across novel images, demonstrating that key components of gaze behavior are learnable from visual structure alone.

Abstract

Animals often forage via Levy walks stochastic trajectories with heavy tailed step lengths optimized for sparse resource environments. We show that human visual gaze follows similar dynamics when scanning images. While traditional models emphasize image based saliency, the underlying spatiotemporal statistics of eye movements remain underexplored. Understanding these dynamics has broad applications in attention modeling and vision-based interfaces. In this study, we conducted a large scale human subject experiment involving 40 participants viewing 50 diverse images under unconstrained conditions, recording over 4 million gaze points using a high speed eye tracker. Analysis of these data shows that the gaze trajectory of the human eye also follows a Levy walk akin to animal foraging. This suggests that the human eye forages for visual information in an optimally efficient manner. Further, we trained a convolutional neural network (CNN) to predict fixation heatmaps from image input alone. The model accurately reproduced salient fixation regions across novel images, demonstrating that key components of gaze behavior are learnable from visual structure alone. Our findings present new evidence that human visual exploration obeys statistical laws analogous to natural foraging and open avenues for modeling gaze through generative and predictive frameworks.

Foraging with the Eyes: Dynamics in Human Visual Gaze and Deep Predictive Modeling

TL;DR

This study trained a convolutional neural network to predict fixation heatmaps from image input alone and accurately reproduced salient fixation regions across novel images, demonstrating that key components of gaze behavior are learnable from visual structure alone.

Abstract

Animals often forage via Levy walks stochastic trajectories with heavy tailed step lengths optimized for sparse resource environments. We show that human visual gaze follows similar dynamics when scanning images. While traditional models emphasize image based saliency, the underlying spatiotemporal statistics of eye movements remain underexplored. Understanding these dynamics has broad applications in attention modeling and vision-based interfaces. In this study, we conducted a large scale human subject experiment involving 40 participants viewing 50 diverse images under unconstrained conditions, recording over 4 million gaze points using a high speed eye tracker. Analysis of these data shows that the gaze trajectory of the human eye also follows a Levy walk akin to animal foraging. This suggests that the human eye forages for visual information in an optimally efficient manner. Further, we trained a convolutional neural network (CNN) to predict fixation heatmaps from image input alone. The model accurately reproduced salient fixation regions across novel images, demonstrating that key components of gaze behavior are learnable from visual structure alone. Our findings present new evidence that human visual exploration obeys statistical laws analogous to natural foraging and open avenues for modeling gaze through generative and predictive frameworks.

Paper Structure

This paper contains 16 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: Gaze trajectory of a subject over an image. We can notice the fixation points quite clearly. They cover all the regions of interest in the image which contain relevant information
  • Figure 2: Comparison of gaze trajectories over two images: (a) low entropy and (b) high entropy. Despite differing entropy levels, the gaze patterns show no significant qualitative difference across subjects.
  • Figure 3: Cumulative Step Length Distribution of all subjects on all images. The mode of the distribution occurs at the 8-10 pixel mark. The inset figure in the top right shows the step length distribution of the combined dataset on a log-log scale, where the linear slope of the tail is approximately -3.49, indicating a heavy-tailed distribution.
  • Figure 4: Step length distributions: (a) All subjects conditioned over one image, the approximated slope of the graph is -2.38 and (b) one subject conditioned over all images, the approximated slope of the graph is -2.41. Both graphs' approximate slopes lie within the limits for a Levy Walk distribution.
  • Figure 5: This is a plot of the entropy of the image vs the Levy Walk parameter $\mu$. We can observe that there is a weak positive correlation between the parameters
  • ...and 4 more figures