Establishing a Baseline for Gaze-driven Authentication Performance in VR: A Breadth-First Investigation on a Very Large Dataset
Dillon Lohr, Michael J. Proulx, Oleg Komogortsev
TL;DR
This study establishes a baseline for gaze-driven authentication in VR using the GazePro dataset, comprising $9202$ participants at $72$ Hz to evaluate a state-of-the-art embedding model. It systematically compares monocular vs binocular data, visual vs optical axes, and the impact of training size, data length, and signal quality on verification and identification. Key findings show that binocular gaze and dual-axis inputs drastically improve verification, that longer training and larger training sets yield better performance, and that while verification remains stable with larger galleries, identification degrades, with practical random-chance limits estimated around $1.48\times10^{5}$ identities. The results demonstrate FIDO-level potential for gaze authentication under realistic consumer hardware, given sufficient data, computation, and careful handling of signal quality and enrollment/verification durations, while also noting limitations in long-term and real-world deployment scenarios.
Abstract
This paper performs the crucial work of establishing a baseline for gaze-driven authentication performance to begin answering fundamental research questions using a very large dataset of gaze recordings from 9202 people with a level of eye tracking (ET) signal quality equivalent to modern consumer-facing virtual reality (VR) platforms. The size of the employed dataset is at least an order-of-magnitude larger than any other dataset from previous related work. Binocular estimates of the optical and visual axes of the eyes and a minimum duration for enrollment and verification are required for our model to achieve a false rejection rate (FRR) of below 3% at a false acceptance rate (FAR) of 1 in 50,000. In terms of identification accuracy which decreases with gallery size, we estimate that our model would fall below chance-level accuracy for gallery sizes of 148,000 or more. Our major findings indicate that gaze authentication can be as accurate as required by the FIDO standard when driven by a state-of-the-art machine learning architecture and a sufficiently large training dataset.
