Table of Contents
Fetching ...

Polarization-Based Eye Tracking with Personalized Siamese Architectures

Beyza Kalkanli, Tom Bu, Mahsa Shakeri, Alexander Fix, Dave Stronks, Dmitri Model, Mantas Žurauskas

Abstract

Head-mounted devices integrated with eye tracking promise a solution for natural human-computer interaction. However, they typically require per-user calibration for optimal performance due to inter-person variability. A differential personalization approach using Siamese architectures learns relative gaze displacements and reconstructs absolute gaze from a small set of calibration frames. In this paper, we benchmark Siamese personalization on polarization-enabled eye tracking. For benchmarking, we use a 338-subject dataset captured with a polarization-sensitive camera and 850 nm illumination. We achieve performance comparable to linear calibration with 10-fold fewer samples. Using polarization inputs for Siamese personalization reduces gaze error by up to 12% compared to near-infrared (NIR)-based inputs. Combining Siamese personalization with linear calibration yields further improvements of up to 13% over a linearly calibrated baseline. These results establish Siamese personalization as a practical approach enabling accurate eye tracking.

Polarization-Based Eye Tracking with Personalized Siamese Architectures

Abstract

Head-mounted devices integrated with eye tracking promise a solution for natural human-computer interaction. However, they typically require per-user calibration for optimal performance due to inter-person variability. A differential personalization approach using Siamese architectures learns relative gaze displacements and reconstructs absolute gaze from a small set of calibration frames. In this paper, we benchmark Siamese personalization on polarization-enabled eye tracking. For benchmarking, we use a 338-subject dataset captured with a polarization-sensitive camera and 850 nm illumination. We achieve performance comparable to linear calibration with 10-fold fewer samples. Using polarization inputs for Siamese personalization reduces gaze error by up to 12% compared to near-infrared (NIR)-based inputs. Combining Siamese personalization with linear calibration yields further improvements of up to 13% over a linearly calibrated baseline. These results establish Siamese personalization as a practical approach enabling accurate eye tracking.

Paper Structure

This paper contains 14 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Intensity, DoLP, and AoLP channels derived from polarization-sensitive camera data.
  • Figure 2: The polarization-based dataset is split into training and testing subjects. Both training and testing employ a Siamese network architecture. During training, the network learns to predict gaze displacement between pairs of images from the same eye of the same subject. During testing, the predicted gaze for an input image is computed by estimating gaze displacements between the input and each image in the subject's calibration set, then aggregating the calibration gaze labels based on the predicted displacements. While our approach processes binocular images as input to each Siamese branch, we illustrate with single-eye images for clarity.
  • Figure 3: P50 (left) and P95 (right) gaze error as a function of the number of anchor images used during Siamese model inference. With 9 anchors, performance matches the Baseline model calibrated on $\sim$100 images.
  • Figure 4: Input sampling strategies for training. Random sampling generates random input pairs for each subject, whereas calibration sampling pairs each input with calibration images from a fixed anchor set selected for that subject.