Table of Contents
Fetching ...

Tri-Cam: Practical Eye Gaze Tracking via Camera Network

Sikai Yang, Wan Du

TL;DR

Tri-Cam presents a practical gaze-tracking system that uses three inexpensive RGB webcams arranged on a monitor to support free movement. It introduces a split neural network separating camera-eye geometry and eye-screen geometry, augmented with intra-validation via auxiliary multitasking and a joint discriminator for weighted fusion of eye crops. An implicit calibration module leverages mouse-click opportunities to reduce explicit data collection, enabling low-effort redeployment. Experimental results show near-Tobii accuracy at 50 cm with a wider movement range, along with favorable training efficiency and robustness across unseen users when combined with implicit calibration.

Abstract

As human eyes serve as conduits of rich information, unveiling emotions, intentions, and even aspects of an individual's health and overall well-being, gaze tracking also enables various human-computer interaction applications, as well as insights in psychological and medical research. However, existing gaze tracking solutions fall short at handling free user movement, and also require laborious user effort in system calibration. We introduce Tri-Cam, a practical deep learning-based gaze tracking system using three affordable RGB webcams. It features a split network structure for efficient training, as well as designated network designs to handle the separated gaze tracking tasks. Tri-Cam is also equipped with an implicit calibration module, which makes use of mouse click opportunities to reduce calibration overhead on the user's end. We evaluate Tri-Cam against Tobii, the state-of-the-art commercial eye tracker, achieving comparable accuracy, while supporting a wider free movement area. In conclusion, Tri-Cam provides a user-friendly, affordable, and robust gaze tracking solution that could practically enable various applications.

Tri-Cam: Practical Eye Gaze Tracking via Camera Network

TL;DR

Tri-Cam presents a practical gaze-tracking system that uses three inexpensive RGB webcams arranged on a monitor to support free movement. It introduces a split neural network separating camera-eye geometry and eye-screen geometry, augmented with intra-validation via auxiliary multitasking and a joint discriminator for weighted fusion of eye crops. An implicit calibration module leverages mouse-click opportunities to reduce explicit data collection, enabling low-effort redeployment. Experimental results show near-Tobii accuracy at 50 cm with a wider movement range, along with favorable training efficiency and robustness across unseen users when combined with implicit calibration.

Abstract

As human eyes serve as conduits of rich information, unveiling emotions, intentions, and even aspects of an individual's health and overall well-being, gaze tracking also enables various human-computer interaction applications, as well as insights in psychological and medical research. However, existing gaze tracking solutions fall short at handling free user movement, and also require laborious user effort in system calibration. We introduce Tri-Cam, a practical deep learning-based gaze tracking system using three affordable RGB webcams. It features a split network structure for efficient training, as well as designated network designs to handle the separated gaze tracking tasks. Tri-Cam is also equipped with an implicit calibration module, which makes use of mouse click opportunities to reduce calibration overhead on the user's end. We evaluate Tri-Cam against Tobii, the state-of-the-art commercial eye tracker, achieving comparable accuracy, while supporting a wider free movement area. In conclusion, Tri-Cam provides a user-friendly, affordable, and robust gaze tracking solution that could practically enable various applications.
Paper Structure (45 sections, 1 equation, 16 figures)

This paper contains 45 sections, 1 equation, 16 figures.

Figures (16)

  • Figure 1: Eye detection via face detection
  • Figure 2: Tri-Cam Neural Network Architecture
  • Figure 3: Two cameras are not enough to support the intra-validation mechanism
  • Figure 4: Cropped eye image quality fluctuations
  • Figure 8: All users' eye image
  • ...and 11 more figures