Exploring Temporal Dynamics in Event-based Eye Tracker
Hongwei Ren, Xiaopeng Lin, Hongxiang Huang, Yue Zhou, Bojun Cheng
TL;DR
TDTracker addresses the challenge of high-speed eye tracking with event cameras by explicitly modeling long-term and short-term temporal dynamics. It combines ITD via 3D convolutions on a Binary Map representation with ETD that cascades FFT-based frequency processing, GRU, and Mamba to capture rich temporal dependencies, followed by heatmap-based coordinate regression trained with KL divergence. The method achieves state-of-the-art performance on SEET and places third in the CVPR 2025 Event-Based Eye Tracking Challenge, while maintaining favorable computational efficiency. The work highlights the importance of accurate temporal representation and heatmap regression for robust, low-latency gaze estimation in dynamic, low-power wearable scenarios.
Abstract
Eye-tracking is a vital technology for human-computer interaction, especially in wearable devices such as AR, VR, and XR. The realization of high-speed and high-precision eye-tracking using frame-based image sensors is constrained by their limited temporal resolution, which impairs the accurate capture of rapid ocular dynamics, such as saccades and blinks. Event cameras, inspired by biological vision systems, are capable of perceiving eye movements with extremely low power consumption and ultra-high temporal resolution. This makes them a promising solution for achieving high-speed, high-precision tracking with rich temporal dynamics. In this paper, we propose TDTracker, an effective eye-tracking framework that captures rapid eye movements by thoroughly modeling temporal dynamics from both implicit and explicit perspectives. TDTracker utilizes 3D convolutional neural networks to capture implicit short-term temporal dynamics and employs a cascaded structure consisting of a Frequency-aware Module, GRU, and Mamba to extract explicit long-term temporal dynamics. Ultimately, a prediction heatmap is used for eye coordinate regression. Experimental results demonstrate that TDTracker achieves state-of-the-art (SOTA) performance on the synthetic SEET dataset and secured Third place in the CVPR event-based eye-tracking challenge 2025. Our code is available at https://github.com/rhwxmx/TDTracker.
