Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling

Hoang M. Truong; Vinh-Thuan Ly; Huy G. Tran; Thuan-Phat Nguyen; Tram T. Doan

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling

Hoang M. Truong, Vinh-Thuan Ly, Huy G. Tran, Thuan-Phat Nguyen, Tram T. Doan

TL;DR

This paper tackles robust, real-time gaze estimation from event-based eye-tracking data under real-world noise and rapid eye movements. It introduces two complementary strategies: augmented robustness for a lightweight spatiotemporal baseline and KnightPupil, a hybrid architecture that fuses EfficientNet-B3 spatial encoding with Bi-GRU temporal modeling and a dynamic Linear Time-Varying State-Space Model for adaptive temporal transitions. On the 3ET+ benchmark, augmentation improves robustness while KnightPupil delivers strong edge-deployable performance, achieving competitive Euclidean error and p10 metrics. The proposed dual-path framework balances deployable efficiency with adaptive temporal modeling, offering a solid foundation for future neuromorphic-vision developments in AR/VR and neuro-oculomotor analysis.

Abstract

Event-based eye tracking has become a pivotal technology for augmented reality and human-computer interaction. Yet, existing methods struggle with real-world challenges such as abrupt eye movements and environmental noise. Building on the efficiency of the Lightweight Spatiotemporal Network-a causal architecture optimized for edge devices-we introduce two key advancements. First, a robust data augmentation pipeline incorporating temporal shift, spatial flip, and event deletion improves model resilience, reducing Euclidean distance error by 12% (1.61 vs. 1.70 baseline) on challenging samples. Second, we propose KnightPupil, a hybrid architecture combining an EfficientNet-B3 backbone for spatial feature extraction, a bidirectional GRU for contextual temporal modeling, and a Linear Time-Varying State-Space Module to adapt to sparse inputs and noise dynamically. Evaluated on the 3ET+ benchmark, our framework achieved 1.61 Euclidean distance on the private test set of the Event-based Eye Tracking Challenge at CVPR 2025, demonstrating its effectiveness for practical deployment in AR/VR systems while providing a foundation for future innovations in neuromorphic vision.

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling

TL;DR

Abstract

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)