JaneEye: A 12-nm 2K-FPS 18.9-$μ$J/Frame Event-based Eye Tracking Accelerator
Tao Han, Ang Li, Qinyu Chen, Chang Gao
TL;DR
JaneEye tackles the challenge of energy-efficient, high-speed eye tracking for XR wearables by converting asynchronous event streams into dense frames and processing them with an ultra-light ConvJANET-based network on a 12-nm ASIC. The method combines time-based and count-based event aggregation, a compact neural architecture with ConvJANET, GMLP, and a pupil localization head, and hardware-aware optimizations including activation approximations and mixed-precision quantization, coupled with progressive retraining. The resulting system achieves 2.45 pixel pupil accuracy on the 3ET+ dataset with 17.6K parameters, up to 1250 Hz event frame rate, end-to-end latency 0.5 ms at 2000 FPS, energy 18.9 μJ/frame, and 567 GOP/s/W efficiency on a 12-nm ASIC with a 64-PE array. This places JaneEye ahead of state-of-the-art eye trackers in energy-delay product while maintaining competitive accuracy, making real-time eye tracking viable for wearable XR devices. The work demonstrates a strong software-hardware co-design trajectory for sparse, event-based perception in resource-constrained environments.
Abstract
Eye tracking has become a key technology for gaze-based interactions in Extended Reality (XR). However, conventional frame-based eye-tracking systems often fall short of XR's stringent requirements for high accuracy, low latency, and energy efficiency. Event cameras present a compelling alternative, offering ultra-high temporal resolution and low power consumption. In this paper, we present JaneEye, an energy-efficient event-based eye-tracking hardware accelerator designed specifically for wearable devices, leveraging sparse, high-temporal-resolution event data. We introduce an ultra-lightweight neural network architecture featuring a novel ConvJANET layer, which simplifies the traditional ConvLSTM by retaining only the forget gate, thereby halving computational complexity without sacrificing temporal modeling capability. Our proposed model achieves high accuracy with a pixel error of 2.45 on the 3ET+ dataset, using only 17.6K parameters, with up to 1250 Hz event frame rate. To further enhance hardware efficiency, we employ custom linear approximations of activation functions (hardsigmoid and hardtanh) and fixed-point quantization. Through software-hardware co-design, our 12-nm ASIC implementation operates at 400 MHz, delivering an end-to-end latency of 0.5 ms (equivalent to 2000 Frames Per Second (FPS)) at an energy efficiency of 18.9 $μ$J/frame. JaneEye sets a new benchmark in low-power, high-performance eye-tracking solutions suitable for integration into next-generation XR wearables.
