Table of Contents
Fetching ...

A Framework for Pupil Tracking with Event Cameras

Khadija Iddrisu, Waseem Shariff, Suzanne Little

TL;DR

This work tackles the challenge of high-speed pupil tracking necessary for saccade analysis by leveraging event cameras, which offer high temporal resolution and low latency. The authors convert asynchronous events into frame-like representations using fixed 10 ms windows and a 2000-event threshold, then apply YOLOv8 to detect the pupil on these frames. They train and evaluate four YOLOv8 variants on 3200 labeled frames derived from the EV-Eye dataset, showing that YOLOv8n achieves the best balance of accuracy and efficiency (mAP up to 0.981, precision ~0.965) with minimal parameters. The approach demonstrates strong potential for neuroscience, ophthalmology, and human-computer interaction, while acknowledging limitations in remote-eye generalization and occlusion robustness, with future work aimed at ROI enhancements and broader dataset coverage.

Abstract

Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700°/s in humans, especially during larger saccades that cover a visual angle of 25°. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.

A Framework for Pupil Tracking with Event Cameras

TL;DR

This work tackles the challenge of high-speed pupil tracking necessary for saccade analysis by leveraging event cameras, which offer high temporal resolution and low latency. The authors convert asynchronous events into frame-like representations using fixed 10 ms windows and a 2000-event threshold, then apply YOLOv8 to detect the pupil on these frames. They train and evaluate four YOLOv8 variants on 3200 labeled frames derived from the EV-Eye dataset, showing that YOLOv8n achieves the best balance of accuracy and efficiency (mAP up to 0.981, precision ~0.965) with minimal parameters. The approach demonstrates strong potential for neuroscience, ophthalmology, and human-computer interaction, while acknowledging limitations in remote-eye generalization and occlusion robustness, with future work aimed at ROI enhancements and broader dataset coverage.

Abstract

Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700°/s in humans, especially during larger saccades that cover a visual angle of 25°. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.
Paper Structure (9 sections, 3 figures, 1 table)

This paper contains 9 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Labelling process of event frames with LabelImg tool.
  • Figure 2: Training and Validation Losses for YOLOv8 on pupil tracking. The horizontal axes represent the number of epochs, while the vertical axes represent the value of each metric during training.
  • Figure 3: Qualitative results of the best performing proposed model (YOLOv8-n).The image on the left indicates ground truth labels while the image on the right indicates predicted labels along with confidence scores.