Using Deep Learning to Increase Eye-Tracking Robustness, Accuracy, and Precision in Virtual Reality
Kevin Barkevich, Reynold Bailey, Gabriel J. Diaz
TL;DR
This paper evaluates how contemporary ML-based eye feature segmentation networks influence gaze estimation quality in VR, comparing RITnet, EllSegGen, and ESFnet as preprocessing steps and as direct detectors against a native Pupil Labs detector, using both feature-based and 3D model-based gaze mappings. By deploying an open-source evaluation pipeline on VR eye-tracking data, it quantifies dropout rate, accuracy, and precision across two image resolutions and multiple gaze-estimation strategies. The findings show that well-performing segmentation models can reduce data dropouts and improve precision without sacrificing accuracy, with EllSegGen and ESFnet often delivering the strongest benefits, especially at 400×400 pixel resolution. The work provides practical guidelines for selecting pupil-detection networks in mobile VR and establishes an open framework for future, potentially real-time, ML-based eye-tracking improvements.
Abstract
Algorithms for the estimation of gaze direction from mobile and video-based eye trackers typically involve tracking a feature of the eye that moves through the eye camera image in a way that covaries with the shifting gaze direction, such as the center or boundaries of the pupil. Tracking these features using traditional computer vision techniques can be difficult due to partial occlusion and environmental reflections. Although recent efforts to use machine learning (ML) for pupil tracking have demonstrated superior results when evaluated using standard measures of segmentation performance, little is known of how these networks may affect the quality of the final gaze estimate. This work provides an objective assessment of the impact of several contemporary ML-based methods for eye feature tracking when the subsequent gaze estimate is produced using either feature-based or model-based methods. Metrics include the accuracy and precision of the gaze estimate, as well as drop-out rate.
