Table of Contents
Fetching ...

Digitally Prototype Your Eye Tracker: Simulating Hardware Performance using 3D Synthetic Data

Esther Y. H. Lin, Yimin Ding, Jogendra Kundu, Yatong An, Mohamed T. El-Haddad, Alexander Fix

TL;DR

The paper tackles rapid ET hardware prototyping for AR/VR by replacing costly real-data collection with a synthetic data generator built on a mesh-NeRF hybrid eye model and an optical-effects simulator. It introduces a digital twin framework that renders eye images across diverse camera placements and lighting, then evaluates end-to-end gaze estimation using a fixed Project Aria model to predict hardware-induced performance changes. Key contributions include a light-dome capture setup for 3D eye reconstructions, a 195-identity synthetic dataset with 114 gaze targets per identity, and empirical evidence that synthetically trained models track relative performance changes with real data, including novel viewpoint demonstrations. The approach enables scalable, end-to-end hardware prototyping, while acknowledging domain gaps and practical limitations such as gaze-range, wavelength constraints, and monocular data.

Abstract

Eye tracking (ET) is a key enabler for Augmented and Virtual Reality (AR/VR). Prototyping new ET hardware requires assessing the impact of hardware choices on eye tracking performance. This task is compounded by the high cost of obtaining data from sufficiently many variations of real hardware, especially for machine learning, which requires large training datasets. We propose a method for end-to-end evaluation of how hardware changes impact machine learning-based ET performance using only synthetic data. We utilize a dataset of real 3D eyes, reconstructed from light dome data using neural radiance fields (NeRF), to synthesize captured eyes from novel viewpoints and camera parameters. Using this framework, we demonstrate that we can predict the relative performance across various hardware configurations, accounting for variations in sensor noise, illumination brightness, and optical blur. We also compare our simulator with the publicly available eye tracking dataset from the Project Aria glasses, demonstrating a strong correlation with real-world performance. Finally, we present a first-of-its-kind analysis in which we vary ET camera positions, evaluating ET performance ranging from on-axis direct views of the eye to peripheral views on the frame. Such an analysis would have previously required manufacturing physical devices to capture evaluation data. In short, our method enables faster prototyping of ET hardware.

Digitally Prototype Your Eye Tracker: Simulating Hardware Performance using 3D Synthetic Data

TL;DR

The paper tackles rapid ET hardware prototyping for AR/VR by replacing costly real-data collection with a synthetic data generator built on a mesh-NeRF hybrid eye model and an optical-effects simulator. It introduces a digital twin framework that renders eye images across diverse camera placements and lighting, then evaluates end-to-end gaze estimation using a fixed Project Aria model to predict hardware-induced performance changes. Key contributions include a light-dome capture setup for 3D eye reconstructions, a 195-identity synthetic dataset with 114 gaze targets per identity, and empirical evidence that synthetically trained models track relative performance changes with real data, including novel viewpoint demonstrations. The approach enables scalable, end-to-end hardware prototyping, while acknowledging domain gaps and practical limitations such as gaze-range, wavelength constraints, and monocular data.

Abstract

Eye tracking (ET) is a key enabler for Augmented and Virtual Reality (AR/VR). Prototyping new ET hardware requires assessing the impact of hardware choices on eye tracking performance. This task is compounded by the high cost of obtaining data from sufficiently many variations of real hardware, especially for machine learning, which requires large training datasets. We propose a method for end-to-end evaluation of how hardware changes impact machine learning-based ET performance using only synthetic data. We utilize a dataset of real 3D eyes, reconstructed from light dome data using neural radiance fields (NeRF), to synthesize captured eyes from novel viewpoints and camera parameters. Using this framework, we demonstrate that we can predict the relative performance across various hardware configurations, accounting for variations in sensor noise, illumination brightness, and optical blur. We also compare our simulator with the publicly available eye tracking dataset from the Project Aria glasses, demonstrating a strong correlation with real-world performance. Finally, we present a first-of-its-kind analysis in which we vary ET camera positions, evaluating ET performance ranging from on-axis direct views of the eye to peripheral views on the frame. Such an analysis would have previously required manufacturing physical devices to capture evaluation data. In short, our method enables faster prototyping of ET hardware.

Paper Structure

This paper contains 28 sections, 3 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Overview. The Digital Eye Tracker Prototyper receives (A) hardware design considerations—both camera configurations and optical effects—as input. It employs a collection of (B) 3D hybrid representations (NeRFs and meshes) of the eye and periocular region to render images for each gaze. There are 114 gaze representations per identity across 195 identities in the digital eye tracker prototyper. The renders are fed to (C) an evaluator that trains a deep gaze estimator and evaluates performance for the given hardware design considerations.
  • Figure 2: Light dome capture setup with cameras, NIR LEDs and a chin rest rigidly mounted to a frame. Captures are used to construct 3D models of the eye and periocular region.
  • Figure 3: Order of operations in optical simulator. Input is the linear rendered image from the hybrid reconstruction. The output is an image with modified brightness, aperture blur, and noise.
  • Figure 4: Synthetic and Real Images. We compare synthetic (Synth.) images simulated using the configurations of the ET camera on Aria glasses with real ET images captured by the glasses. The synthetic images are able to capture details such as eyelashes and skin texture.
  • Figure 5: Optical trends. Model performance with respect to changing blur, brightness and noise are plotted, separately for each percentile metric. Correlation R-scores between real-trained and synthetic-trained models are given above each graph, showing high degree of correlation.
  • ...and 7 more figures