Table of Contents
Fetching ...

TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking

Pingmei Xu, Krista A Ehinger, Yinda Zhang, Adam Finkelstein, Sanjeev R. Kulkarni, Jianxiong Xiao

TL;DR

This work tackles the scarcity of large-scale gaze data for saliency by introducing a browser-based webcam eye-tracking system that crowdsources subjects on Amazon Mechanical Turk. By integrating a lightweight appearance-based gaze model with browser-compatible facial landmark tracking, calibration-driven online learning, and engaging game interfaces, the approach achieves lab-quality data at a fraction of the cost. The authors validate the method against lab-based eye trackers and demonstrate its scalability by building the iSUN dataset of 20,608 natural-scene images with crowd-sourced gaze data. The tools and dataset enable researchers to train data-hungry saliency models across diverse stimuli and tasks, with open-source access to the platform. Overall, this work provides a practical, scalable path to large-scale gaze data collection for vision research.

Abstract

Traditional eye tracking requires specialized hardware, which means collecting gaze data from many observers is expensive, tedious and slow. Therefore, existing saliency prediction datasets are order-of-magnitudes smaller than typical datasets for other vision recognition tasks. The small size of these datasets limits the potential for training data intensive algorithms, and causes overfitting in benchmark evaluation. To address this deficiency, this paper introduces a webcam-based gaze tracking system that supports large-scale, crowdsourced eye tracking deployed on Amazon Mechanical Turk (AMTurk). By a combination of careful algorithm and gaming protocol design, our system obtains eye tracking data for saliency prediction comparable to data gathered in a traditional lab setting, with relatively lower cost and less effort on the part of the researchers. Using this tool, we build a saliency dataset for a large number of natural images. We will open-source our tool and provide a web server where researchers can upload their images to get eye tracking results from AMTurk.

TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking

TL;DR

This work tackles the scarcity of large-scale gaze data for saliency by introducing a browser-based webcam eye-tracking system that crowdsources subjects on Amazon Mechanical Turk. By integrating a lightweight appearance-based gaze model with browser-compatible facial landmark tracking, calibration-driven online learning, and engaging game interfaces, the approach achieves lab-quality data at a fraction of the cost. The authors validate the method against lab-based eye trackers and demonstrate its scalability by building the iSUN dataset of 20,608 natural-scene images with crowd-sourced gaze data. The tools and dataset enable researchers to train data-hungry saliency models across diverse stimuli and tasks, with open-source access to the platform. Overall, this work provides a practical, scalable path to large-scale gaze data collection for vision research.

Abstract

Traditional eye tracking requires specialized hardware, which means collecting gaze data from many observers is expensive, tedious and slow. Therefore, existing saliency prediction datasets are order-of-magnitudes smaller than typical datasets for other vision recognition tasks. The small size of these datasets limits the potential for training data intensive algorithms, and causes overfitting in benchmark evaluation. To address this deficiency, this paper introduces a webcam-based gaze tracking system that supports large-scale, crowdsourced eye tracking deployed on Amazon Mechanical Turk (AMTurk). By a combination of careful algorithm and gaming protocol design, our system obtains eye tracking data for saliency prediction comparable to data gathered in a traditional lab setting, with relatively lower cost and less effort on the part of the researchers. Using this tool, we build a saliency dataset for a large number of natural images. We will open-source our tool and provide a web server where researchers can upload their images to get eye tracking results from AMTurk.

Paper Structure

This paper contains 19 sections, 14 figures, 1 table.

Figures (14)

  • Figure 1: We propose a webcam-based eye tracking system to collect saliency data on a large-scale. By packaging the eye tracking experiment into a carefully designed web game, we are able to collect good quality gaze data on crowdsourcing platforms such as AMTurk.
  • Figure 2: An example of the saliency data obtained by our system. In a free-viewing task, users were shown selected images from the SUN database with fully annotated objects. From the raw eye tracking data, we used the proposed algorithm to estimate fixations and then computed the saliency map. This map could be used to evaluate the saliency of individual objects in the image.
  • Figure 3: The procedure for an eye tracking experiment. 13-point calibration and validation were performed at the start of each session. During the calibration, we trained a user-task specific model for gaze prediction. For validation, we displayed the online gaze prediction results in the context of a game to make sure that the tracking was functioning properly. Our system supports various stimuli (such as image or video) and various tasks (such as free viewing, image memory, or object search).
  • Figure 4: An example of the dynamic saliency maps for video clips collected by our system. Saliency in video is highly dependent on the context of the previous frames, so dynamic saliency maps look very different from the static saliency maps obtained for the same frames shown in isolation as part of a free-viewing task.
  • Figure 5: Statistics of basic information for workers from AMTurk. The data is obtained from 200 randomly-selected workers who participated our eye tracking experiment.
  • ...and 9 more figures