Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

Pietro Bonazzi; Sizhen Bian; Giovanni Lippolis; Yawei Li; Sadique Sheik; Michele Magno

Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

Pietro Bonazzi, Sizhen Bian, Giovanni Lippolis, Yawei Li, Sadique Sheik, Michele Magno

TL;DR

This paper presents Retina, a neuromorphic eye-tracking solution that processes pure event data from a Dynamic Vision Sensor using a directly trained Spiking Neural Network deployed on the Speck edge processor. It introduces Ini-30, an event-based pupil dataset captured with glass-mounted DVS cameras from 30 volunteers, and demonstrates end-to-end performance with low power (approximately $2.89$ to $4.8$ mW) and low latency (approximately $5.57$ to $8.01$ ms). Retina achieves a pupil-centroid error of about $3.24$ px on a $64\times64$ DVS input while requiring far fewer MACs ($3.03$M) and parameters ($63$k) than prior methods like 3ET. The combination of a lightweight SNN with a temporal weighted-sum regression and end-to-end neuromorphic deployment yields a competitive, energy-efficient, event-based eye-tracking pipeline suitable for wearable, edge devices and real-world use.

Abstract

This paper introduces a neuromorphic methodology for eye tracking, harnessing pure event data captured by a Dynamic Vision Sensor (DVS) camera. The framework integrates a directly trained Spiking Neuron Network (SNN) regression model and leverages a state-of-the-art low power edge neuromorphic processor - Speck, collectively aiming to advance the precision and efficiency of eye-tracking systems. First, we introduce a representative event-based eye-tracking dataset, "Ini-30", which was collected with two glass-mounted DVS cameras from thirty volunteers. Then,a SNN model, based on Integrate And Fire (IAF) neurons, named "Retina", is described , featuring only 64k parameters (6.63x fewer than the latest) and achieving pupil tracking error of only 3.24 pixels in a 64x64 DVS input. The continous regression output is obtained by means of convolution using a non-spiking temporal 1D filter slided across the output spiking layer. Finally, we evaluate Retina on the neuromorphic processor, showing an end-to-end power between 2.89-4.8 mW and a latency of 5.57-8.01 mS dependent on the time window. We also benchmark our model against the latest event-based eye-tracking method, "3ET", which was built upon event frames. Results show that Retina achieves superior precision with 1.24px less pupil centroid error and reduced computational complexity with 35 times fewer MAC operations. We hope this work will open avenues for further investigation of close-loop neuromorphic solutions and true event-based training pursuing edge performance.

Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

TL;DR

mW) and low latency (approximately

ms). Retina achieves a pupil-centroid error of about

px on a

DVS input while requiring far fewer MACs (

M) and parameters (

k) than prior methods like 3ET. The combination of a lightweight SNN with a temporal weighted-sum regression and end-to-end neuromorphic deployment yields a competitive, energy-efficient, event-based eye-tracking pipeline suitable for wearable, edge devices and real-world use.

Abstract

Paper Structure (25 sections, 9 equations, 3 figures, 11 tables)

This paper contains 25 sections, 9 equations, 3 figures, 11 tables.

Introduction
Related Work
Non-Event-Based Eye Tracking
Event-Based Eye Tracking
Dataset
Dataset Collection
Dataset Comparaison
Methodology
Data Preparation
Network Architecture
Neuron Model
Temporal Weighted-Sum Filter
Loss Function
Experiments
Setup
...and 10 more sections

Figures (3)

Figure 1: A picture of the hardware for data collection (left) and an example of the video recordings (right) with ground truth (green) and prediction (yellow).
Figure 2: An example illustrating the different techniques for slicing events video recordings (red, blue) in time: A) dt = $800us$, B) events count = $2$.
Figure 3: The firing rates of the trained snn at different network depths with different slicing methods and time windows.

Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

TL;DR

Abstract

Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware

Authors

TL;DR

Abstract

Table of Contents

Figures (3)