FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality
Junyuan Ding, Ziteng Wang, Chang Gao, Min Liu, Qinyu Chen
TL;DR
FACET addresses the challenge of high-precision, low-latency eye tracking in XR by leveraging event camera data to directly predict pupil ellipse parameters with an end-to-end network. It introduces fast causal event volume, fixed-count event binning, and a trigonometric loss to robustly learn ellipse orientation, integrated into a MobileNetV3 backbone with an DSC-based FPN and four heads. The authors augment the EV-Eye dataset with ellipse-based annotations via semi-supervised labeling, achieving 0.2030-pixel pupil center error and 0.530 ms inference time, outperforming prior methods while using far fewer parameters and operations. This work demonstrates that pure event-based, ellipse-focused tracking can meet XR requirements, enabling efficient, high-frequency gaze estimation for next-generation head-mounted displays.
Abstract
Eye tracking is a key technology for gaze-based interactions in Extended Reality (XR), but traditional frame-based systems struggle to meet XR's demands for high accuracy, low latency, and power efficiency. Event cameras offer a promising alternative due to their high temporal resolution and low power consumption. In this paper, we present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse output can be directly used in subsequent ellipse-based pupil trackers. We enhance the EV-Eye dataset by expanding annotated data and converting original mask labels to ellipse-based annotations to train the model. Besides, a novel trigonometric loss is adopted to address angle discontinuities and a fast causal event volume event representation method is put forward. On the enhanced EV-Eye test set, FACET achieves an average pupil center error of 0.20 pixels and an inference time of 0.53 ms, reducing pixel error and inference time by 1.6$\times$ and 1.8$\times$ compared to the prior art, EV-Eye, with 4.4$\times$ and 11.7$\times$ less parameters and arithmetic operations. The code is available at https://github.com/DeanJY/FACET.
