Table of Contents
Fetching ...

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So, Philippe Bich, Chiara Boretti, Luciano Prono, Mircea Lică, David Dinucu-Jianu, Cătălin Grîu, Xiaopeng Lin, Hongwei Ren, Bojun Cheng, Xinan Zhang, Valentin Vial, Anthony Yezzi, James Tsai

TL;DR

This survey analyzes the AIS 2024 Event-Based Eye Tracking Challenge, focusing on efficient pupil center localization from event-camera data. It introduces the 3ET+ dataset, defines the $p_{10}$-based metric along with per-class accuracies and distance measures, and describes a provided data-loading/training pipeline to standardize comparisons. The results reveal a broad spectrum of approaches—from stateful recurrent architectures and memory-channel representations to hardware-aware, sparse-convolution and point-based networks—highlighting that there is no single dominant method yet; instead, performance and efficiency trade-offs are explored across methods and hardware platforms. The findings underscore the viability of event cameras for eye tracking, emphasize the importance of hardware-software co-design, and point to future directions in real-time, low-power eye-tracking systems for AR/VR and wearable healthcare applications.

Abstract

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.

Event-Based Eye Tracking. AIS 2024 Challenge Survey

TL;DR

This survey analyzes the AIS 2024 Event-Based Eye Tracking Challenge, focusing on efficient pupil center localization from event-camera data. It introduces the 3ET+ dataset, defines the -based metric along with per-class accuracies and distance measures, and describes a provided data-loading/training pipeline to standardize comparisons. The results reveal a broad spectrum of approaches—from stateful recurrent architectures and memory-channel representations to hardware-aware, sparse-convolution and point-based networks—highlighting that there is no single dominant method yet; instead, performance and efficiency trade-offs are explored across methods and hardware platforms. The findings underscore the viability of event cameras for eye tracking, emphasize the importance of hardware-software co-design, and point to future directions in real-time, low-power eye-tracking systems for AR/VR and wearable healthcare applications.

Abstract

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.
Paper Structure (44 sections, 7 equations, 12 figures, 8 tables)

This paper contains 44 sections, 7 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: MambaPupil network by Team USTCEventGroup.
  • Figure 2: Schematic of the Consistent Eye Tracking Model, including the event representation, preprocessing and the tracking predictor.
  • Figure 3: A. A lightweight spatiotemporal architecture for efficient eye tracking. The backbone is composed of a succession of 5 spatiotemporal blocks. Each spatiotemporal block consists of a temporal convolution followed by a spatial convolution. B. The model can be configured to run in streaming inference mode by using an input FIFO buffer for each temporal layer. The sliding-window mechanism of the FIFO buffer would act as the convolution sliding window, and the convolution operation itself is simply replaced by a dot product between the elements in the FIFO buffer and kernel weights. C. Compares the methods of direct binning, event volume binning, and causal event volume binning. The last method retains temporal information while still being fully causal.
  • Figure 4: Hardware design of SEE
  • Figure 5: Software architecture of SEE
  • ...and 7 more figures