Table of Contents
Fetching ...

Event-Based Eye Tracking. 2025 Event-based Vision Workshop

Qinyu Chen, Chang Gao, Min Liu, Daniele Perrone, Yan Ru Pei, Zuowen Wang, Zhuo Zou, Shihang Tan, Tao Han, Guorui Lu, Zhen Xu, Junyuan Ding, Ziteng Wang, Zongwei Wu, Han Han, Yuliang Wu, Jinze Chen, Wei Zhai, Yang Cao, Zheng-jun Zha, Nuwan Bandara, Thivya Kandappu, Archan Misra, Xiaopeng Lin, Hongxiang Huang, Hongwei Ren, Bojun Cheng, Hoang M. Truong, Vinh-Thuan Ly, Huy G. Tran, Thuan-Phat Nguyen, Tram T. Doan

TL;DR

This survey analyzes the 2025 Event-Based Eye Tracking Challenge, focusing on predicting pupil center from asynchronous event streams produced by DVS sensors for high-speed gaze interaction in AR/VR and healthcare. It highlights the 3ET+ dataset with ground-truth pupil centers annotated at $100 Hz$ across $13$ participants and five eye-movement tasks, and adopts pixel error $E = sqrt{(x_{pred}-x_{gt})^2+(y_{pred}-y_{gt})^2}$ as the primary metric. The top methods blend short- and long-term temporal modeling (e.g., BRAT, 3D-CNN+GRU+Mamba), pragmatic data augmentation, and model-agnostic inference-time post-processing to improve accuracy while preserving efficiency. The survey discusses hardware considerations, such as power-efficient event-driven processing and in-sensor preprocessing, and outlines open directions for integrating computational workload, sparsity, and edge hardware design. Overall, the results show strong progress over two editions, with four teams achieving pixel error below $1.7$ pixels and a clear path toward practical, low-power edge deployments.

Abstract

This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research. In each method, accuracy, model size, and number of operations are reported. In this survey, we also discuss event-based eye tracking from the perspective of hardware design.

Event-Based Eye Tracking. 2025 Event-based Vision Workshop

TL;DR

This survey analyzes the 2025 Event-Based Eye Tracking Challenge, focusing on predicting pupil center from asynchronous event streams produced by DVS sensors for high-speed gaze interaction in AR/VR and healthcare. It highlights the 3ET+ dataset with ground-truth pupil centers annotated at across participants and five eye-movement tasks, and adopts pixel error as the primary metric. The top methods blend short- and long-term temporal modeling (e.g., BRAT, 3D-CNN+GRU+Mamba), pragmatic data augmentation, and model-agnostic inference-time post-processing to improve accuracy while preserving efficiency. The survey discusses hardware considerations, such as power-efficient event-driven processing and in-sensor preprocessing, and outlines open directions for integrating computational workload, sparsity, and edge hardware design. Overall, the results show strong progress over two editions, with four teams achieving pixel error below pixels and a clear path toward practical, low-power edge deployments.

Abstract

This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research. In each method, accuracy, model size, and number of operations are reported. In this survey, we also discuss event-based eye tracking from the perspective of hardware design.

Paper Structure

This paper contains 31 sections, 4 equations, 8 figures, 8 tables, 2 algorithms.

Figures (8)

  • Figure 1: Comparison of the processing flow and estimation patterns between frame-based and event-based systems for eye tracking. Adapted from tan2025etprocessor.
  • Figure 2: BRAT network by Team USTCEventGroup.
  • Figure 3: Bidirectional Relative Positional Attention.
  • Figure 4: The architecture of TDTracker. TDTracker primarily comprises two components, Implicit Temporal Dynamic (ITD) and Explicit Temporal Dynamic (ETD), with a structure featuring three ITD components to ensure effective feature abstraction. It employs a cascaded architecture of three distinct time series models to capture temporal information comprehensively.
  • Figure 5: The visualization heatmap generated by the TDTracker.
  • ...and 3 more figures