Digitally Prototype Your Eye Tracker: Simulating Hardware Performance using 3D Synthetic Data
Esther Y. H. Lin, Yimin Ding, Jogendra Kundu, Yatong An, Mohamed T. El-Haddad, Alexander Fix
TL;DR
The paper tackles rapid ET hardware prototyping for AR/VR by replacing costly real-data collection with a synthetic data generator built on a mesh-NeRF hybrid eye model and an optical-effects simulator. It introduces a digital twin framework that renders eye images across diverse camera placements and lighting, then evaluates end-to-end gaze estimation using a fixed Project Aria model to predict hardware-induced performance changes. Key contributions include a light-dome capture setup for 3D eye reconstructions, a 195-identity synthetic dataset with 114 gaze targets per identity, and empirical evidence that synthetically trained models track relative performance changes with real data, including novel viewpoint demonstrations. The approach enables scalable, end-to-end hardware prototyping, while acknowledging domain gaps and practical limitations such as gaze-range, wavelength constraints, and monocular data.
Abstract
Eye tracking (ET) is a key enabler for Augmented and Virtual Reality (AR/VR). Prototyping new ET hardware requires assessing the impact of hardware choices on eye tracking performance. This task is compounded by the high cost of obtaining data from sufficiently many variations of real hardware, especially for machine learning, which requires large training datasets. We propose a method for end-to-end evaluation of how hardware changes impact machine learning-based ET performance using only synthetic data. We utilize a dataset of real 3D eyes, reconstructed from light dome data using neural radiance fields (NeRF), to synthesize captured eyes from novel viewpoints and camera parameters. Using this framework, we demonstrate that we can predict the relative performance across various hardware configurations, accounting for variations in sensor noise, illumination brightness, and optical blur. We also compare our simulator with the publicly available eye tracking dataset from the Project Aria glasses, demonstrating a strong correlation with real-world performance. Finally, we present a first-of-its-kind analysis in which we vary ET camera positions, evaluating ET performance ranging from on-axis direct views of the eye to peripheral views on the frame. Such an analysis would have previously required manufacturing physical devices to capture evaluation data. In short, our method enables faster prototyping of ET hardware.
