3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network
Qinyu Chen, Zuowen Wang, Shih-Chii Liu, Chang Gao
TL;DR
This work tackles real-time pupil tracking from sparse event streams using an efficient event-based approach for wearables. It introduces Change-Based ConvLSTM (CB-ConvLSTM), which injects temporal sparsity by using the thresholded hidden-change ΔH_{t-1} in gate computations, including a formal ΔH_{t-1} definition. On a synthetic DVS LPW pupil dataset, the method achieves roughly 85.3% temporal sparsity with a 4.7× reduction in arithmetic operations and maintains accuracy, outperforming CNN baselines by over 30%. The approach is well-suited for low-power AR/VR headsets and can benefit from hardware that exploits spatio-temporal sparsity; code and data are publicly available.
Abstract
This paper presents a sparse Change-Based Convolutional Long Short-Term Memory (CB-ConvLSTM) model for event-based eye tracking, key for next-generation wearable healthcare technology such as AR/VR headsets. We leverage the benefits of retina-inspired event cameras, namely their low-latency response and sparse output event stream, over traditional frame-based cameras. Our CB-ConvLSTM architecture efficiently extracts spatio-temporal features for pupil tracking from the event stream, outperforming conventional CNN structures. Utilizing a delta-encoded recurrent path enhancing activation sparsity, CB-ConvLSTM reduces arithmetic operations by approximately 4.7$\times$ without losing accuracy when tested on a \texttt{v2e}-generated event dataset of labeled pupils. This increase in efficiency makes it ideal for real-time eye tracking in resource-constrained devices. The project code and dataset are openly available at \url{https://github.com/qinche106/cb-convlstm-eyetracking}.
