Benchmarking Microsaccade Recognition with Event Cameras: A Novel Dataset and Evaluation
Waseem Shariff, Timothy Hanley, Maciej Stec, Hossein Javidnia, Peter Corcoran
TL;DR
This work tackles microsaccade recognition with event cameras by introducing a Blender-generated synthetic dataset (175,000 sequences across seven angular classes from $0.5^ \circ$ to $2.0^ \circ$, durations $0.25$–$2.25$ ms) converted to event streams via v2e. It benchmarks Spiking-VGG11/13/16 and a motion-regularized Spiking-VGG16Flow that learns to predict dense optical flow targets (via Farneback) alongside classification, to encourage motion-aware representations. Results show about 90% average accuracy across classes and indicate flow supervision provides regularization rather than a large accuracy gain, with models exhibiting some generalization to real EV-Eye data despite synthetic training. The work establishes a public benchmark, demonstrates the viability of synthetic data for neuromorphic vision tasks, and points toward future work on real-data collection under cognitive load and trajectory-focused microsaccade objectives.
Abstract
Microsaccades are small, involuntary eye movements vital for visual perception and neural processing. Traditional microsaccade studies typically use eye trackers or frame-based analysis, which, while precise, are costly and limited in scalability and temporal resolution. Event-based sensing offers a high-speed, low-latency alternative by capturing fine-grained spatiotemporal changes efficiently. This work introduces a pioneering event-based microsaccade dataset to support research on small eye movement dynamics in cognitive computing. Using Blender, we render high-fidelity eye movement scenarios and simulate microsaccades with angular displacements from 0.5 to 2.0 degrees, divided into seven distinct classes. These are converted to event streams using v2e, preserving the natural temporal dynamics of microsaccades, with durations ranging from 0.25 ms to 2.25 ms. We evaluate the dataset using Spiking-VGG11, Spiking-VGG13, and Spiking-VGG16, and propose Spiking-VGG16Flow, an optical-flow-enhanced variant implemented in SpikingJelly. The models achieve around 90 percent average accuracy, successfully classifying microsaccades by angular displacement, independent of event count or duration. These results demonstrate the potential of spiking neural networks for fine motion recognition and establish a benchmark for event-based vision research. The dataset, code, and trained models will be publicly available at https://waseemshariff126.github.io/microsaccades/ .
