Table of Contents
Fetching ...

Suppressing Uncertainty in Gaze Estimation

Shijing Wang, Yaping Huang

TL;DR

We address the problem of data uncertainty in gaze estimation arising from low-quality images and mislabelled points. Our approach, SUGE, introduces a triplet-label consistency framework built on neighboring labeling, uncertainty metrics, and Gaussian Mixture Model confidences to drive label correction and sample weighting, with a co-training setup to curb self-training bias. The method achieves state-of-the-art performance on EyeDiap, MPIIFaceGaze, Gaze360, and ETH-XGaze-driven tasks by effectively suppressing unreliable data during training. This work highlights the importance of attending to data quality in gaze systems and provides a practical framework to improve robustness in real-world datasets.

Abstract

Uncertainty in gaze estimation manifests in two aspects: 1) low-quality images caused by occlusion, blurriness, inconsistent eye movements, or even non-face images; 2) incorrect labels resulting from the misalignment between the labeled and actual gaze points during the annotation process. Allowing these uncertainties to participate in training hinders the improvement of gaze estimation. To tackle these challenges, in this paper, we propose an effective solution, named Suppressing Uncertainty in Gaze Estimation (SUGE), which introduces a novel triplet-label consistency measurement to estimate and reduce the uncertainties. Specifically, for each training sample, we propose to estimate a novel ``neighboring label'' calculated by a linearly weighted projection from the neighbors to capture the similarity relationship between image features and their corresponding labels, which can be incorporated with the predicted pseudo label and ground-truth label for uncertainty estimation. By modeling such triplet-label consistency, we can measure the qualities of both images and labels, and further largely reduce the negative effects of unqualified images and wrong labels through our designed sample weighting and label correction strategies. Experimental results on the gaze estimation benchmarks indicate that our proposed SUGE achieves state-of-the-art performance.

Suppressing Uncertainty in Gaze Estimation

TL;DR

We address the problem of data uncertainty in gaze estimation arising from low-quality images and mislabelled points. Our approach, SUGE, introduces a triplet-label consistency framework built on neighboring labeling, uncertainty metrics, and Gaussian Mixture Model confidences to drive label correction and sample weighting, with a co-training setup to curb self-training bias. The method achieves state-of-the-art performance on EyeDiap, MPIIFaceGaze, Gaze360, and ETH-XGaze-driven tasks by effectively suppressing unreliable data during training. This work highlights the importance of attending to data quality in gaze systems and provides a practical framework to improve robustness in real-world datasets.

Abstract

Uncertainty in gaze estimation manifests in two aspects: 1) low-quality images caused by occlusion, blurriness, inconsistent eye movements, or even non-face images; 2) incorrect labels resulting from the misalignment between the labeled and actual gaze points during the annotation process. Allowing these uncertainties to participate in training hinders the improvement of gaze estimation. To tackle these challenges, in this paper, we propose an effective solution, named Suppressing Uncertainty in Gaze Estimation (SUGE), which introduces a novel triplet-label consistency measurement to estimate and reduce the uncertainties. Specifically, for each training sample, we propose to estimate a novel ``neighboring label'' calculated by a linearly weighted projection from the neighbors to capture the similarity relationship between image features and their corresponding labels, which can be incorporated with the predicted pseudo label and ground-truth label for uncertainty estimation. By modeling such triplet-label consistency, we can measure the qualities of both images and labels, and further largely reduce the negative effects of unqualified images and wrong labels through our designed sample weighting and label correction strategies. Experimental results on the gaze estimation benchmarks indicate that our proposed SUGE achieves state-of-the-art performance.

Paper Structure

This paper contains 24 sections, 16 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of uncertainties in gaze estimation using the EyeDiap dataset as an example. The upper half of the figure reflects unqualified images, where the images in right side are extremely difficult for machines and even human. These images are better to be suppressed in training. The lower half of the figure reflects wrongly annotated labels, where the left side represents the common data annotation process. Due to the challenge of achieving perfect alignment between the actual gaze points and the given gaze points, inaccurate and incorrect labels exist in the datasets, which should be rectified.
  • Figure 2: Visualization results of varying image and label confidences using samples from folds 0, 1, 3 of the EyeDiap dataset as the training set in the subsequent epoch after the warm-up phase. These results showcase the effectiveness of the two uncertainty metrics we design.
  • Figure 3: The pipeline of SUGE method. Initially, input images undergo feature extraction through the encoder, and a fully connected layer generates pseudo labels. The neighboring labeling module then employs a nearest-neighbor algorithm to find feature neighbors for each image and calculates the neighboring label by weighted averaging the ground truth labels from its neighbors. Next, the Uncertainty Metrics module comes into play, computing Tuple Minimum Discrepancy and Triple Minimum Discrepancy by measuring the consistency among pseudo labels, ground truth labels, and neighboring labels. These uncertainties metrics are further input into a Gaussian Mixture Model, which yields two confidence scores: label confidence and image confidence. In the Label Correction and Sample Weighting module, label confidence is employed to perform weighted calculations on the ground truth labels, pseudo labels, and neighboring labels, resulting in corrected labels. Additionally, the sample weight is determined based on the image confidence and can further be used to guide the training process.
  • Figure 4: Visualization of samples with label confidences set to 0 on the left figure and image confidences set to 0 on the right figure. These results are sourced from the EyeDiap dataset (folds 0, 1, 3 as the training set), Gaze360 dataset, and MPIIFaceGaze dataset (users 1-14 as the training data), during the initial epoch after warm-up.