Table of Contents
Fetching ...

Fixation-based Self-calibration for Eye Tracking in VR Headsets

Ryusei Uramune, Sei Ikeda, Hiroki Ishizuka, Osamu Oshiro

TL;DR

This work tackles the challenge of calibrating eye-tracking in VR without explicit user calibration by proposing a fixation-based self-calibration method. It leverages 3D fixation behavior and a scene-camera model to estimate the horizontal and vertical offset between the optical and visual axes, optimizing calibration parameters via reprojection-error minimization using a differential-evolution search. The method demonstrates feasibility by achieving an average accuracy around $2^{\circ}$ in dynamic, occlusion-rich 3D VR environments with walking tasks, outperforming traditional optical-axis baselines and many self-calibration approaches. The study also analyzes the influence of fixation-detection algorithms and initial calibration guesses on accuracy, identifies convergence behavior with walking distance, and discusses limitations and future extensions to moving objects, physiological factors, and AR contexts.

Abstract

This study proposes a novel self-calibration method for eye tracking in a virtual reality (VR) headset. The proposed method is based on the assumptions that the user's viewpoint can freely move and that the points of regard (PoRs) from different viewpoints are distributed within a small area on an object surface during visual fixation. In the method, fixations are first detected from the time-series data of uncalibrated gaze directions using an extension of the I-VDT (velocity and dispersion threshold identification) algorithm to a three-dimensional (3D) scene. Then, the calibration parameters are optimized by minimizing the sum of a dispersion metrics of the PoRs. The proposed method can potentially identify the optimal calibration parameters representing the user-dependent offset from the optical axis to the visual axis without explicit user calibration, image processing, or marker-substitute objects. For the gaze data of 18 participants walking in two VR environments with many occlusions, the proposed method achieved an accuracy of 2.1$^\circ$, which was significantly lower than the average offset. Our method is the first self-calibration method with an average error lower than 3$^\circ$ in 3D environments. Further, the accuracy of the proposed method can be improved by up to 1.2$^\circ$ by refining the fixation detection or optimization algorithm.

Fixation-based Self-calibration for Eye Tracking in VR Headsets

TL;DR

This work tackles the challenge of calibrating eye-tracking in VR without explicit user calibration by proposing a fixation-based self-calibration method. It leverages 3D fixation behavior and a scene-camera model to estimate the horizontal and vertical offset between the optical and visual axes, optimizing calibration parameters via reprojection-error minimization using a differential-evolution search. The method demonstrates feasibility by achieving an average accuracy around in dynamic, occlusion-rich 3D VR environments with walking tasks, outperforming traditional optical-axis baselines and many self-calibration approaches. The study also analyzes the influence of fixation-detection algorithms and initial calibration guesses on accuracy, identifies convergence behavior with walking distance, and discusses limitations and future extensions to moving objects, physiological factors, and AR contexts.

Abstract

This study proposes a novel self-calibration method for eye tracking in a virtual reality (VR) headset. The proposed method is based on the assumptions that the user's viewpoint can freely move and that the points of regard (PoRs) from different viewpoints are distributed within a small area on an object surface during visual fixation. In the method, fixations are first detected from the time-series data of uncalibrated gaze directions using an extension of the I-VDT (velocity and dispersion threshold identification) algorithm to a three-dimensional (3D) scene. Then, the calibration parameters are optimized by minimizing the sum of a dispersion metrics of the PoRs. The proposed method can potentially identify the optimal calibration parameters representing the user-dependent offset from the optical axis to the visual axis without explicit user calibration, image processing, or marker-substitute objects. For the gaze data of 18 participants walking in two VR environments with many occlusions, the proposed method achieved an accuracy of 2.1, which was significantly lower than the average offset. Our method is the first self-calibration method with an average error lower than 3 in 3D environments. Further, the accuracy of the proposed method can be improved by up to 1.2 by refining the fixation detection or optimization algorithm.
Paper Structure (56 sections, 15 equations, 10 figures, 2 tables)

This paper contains 56 sections, 15 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Principle of the proposed fixation-based calibration method. During a fixation, human eyeballs move so that the PoRs are concentrated around a single point, even if the head moves. We applied this characteristic and the corresponding multi-view geometry to the self-calibration of an eye tracker.
  • Figure 2: Reprojection errors. In the proposed method, the calibration parameters are estimated by optimizing a cost function based on the reprojection errors.
  • Figure 3: VR environments. Each participant was asked to walk along the white outline, counterclockwise and clockwise, twice each, for a total of four consecutive laps, starting from the upper right corner.
  • Figure 4: Absolute errors of the proposed method. "opt" indicates that the initial calibration parameters corresponding to the optical axis were used in the fixation detection. Similarly, "vis" refers to fixation detection using the parameters of the visual axis. "Visual axis" represents the accuracy of the control condition with 16 of the 25 markers. "Optical axis" shows the accuracy of the optical axis generated by adding the average offset to the control condition. All the errors were evaluated with the same gaze data of the other nine markers. The error bars indicate standard errors for all participants.
  • Figure 5: Absolute error and number of fixations. The cumulative distance is the sum of the translational distances of the scene cameras between adjacent frames.
  • ...and 5 more figures