EETnet: a CNN for Gaze Detection and Tracking for Smart-Eyewear
Andrea Aspesi, Andrea Simpsi, Aaron Tognoli, Simone Mentasti, Luca Merigo, Matteo Matteucci
TL;DR
EETnet addresses the challenge of real-time eye tracking on embedded devices by using a compact CNN that processes purely event-based eye data. It provides two output modes—regression for pupil coordinates and grid-based classification for position within a frame—trained on 200 Hz event frames with careful ROI alignment and semi-automatic ground-truth annotation. Through quantization-aware training, the model is deployed on diverse microcontrollers, with MAX78000 delivering sub-3 ms inferences at under 1 mJ per inference, demonstrating practical viability for battery-powered smart eyewear. The approach combines dataset preprocessing, architecture optimization, and hardware-aware deployment to enable low-latency, energy-efficient gaze tracking in wearables.
Abstract
Event-based cameras are becoming a popular solution for efficient, low-power eye tracking. Due to the sparse and asynchronous nature of event data, they require less processing power and offer latencies in the microsecond range. However, many existing solutions are limited to validation on powerful GPUs, with no deployment on real embedded devices. In this paper, we present EETnet, a convolutional neural network designed for eye tracking using purely event-based data, capable of running on microcontrollers with limited resources. Additionally, we outline a methodology to train, evaluate, and quantize the network using a public dataset. Finally, we propose two versions of the architecture: a classification model that detects the pupil on a grid superimposed on the original image, and a regression model that operates at the pixel level.
