Table of Contents
Fetching ...

ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation

Yong Wu, Yang Wang, Sanqing Qu, Zhijun Li, Guang Chen

TL;DR

This work tackles efficient label-free adaptation for 3D gaze estimation, enabling a target-user-specific model from only a few unlabeled images at test time. It introduces a model-agnostic meta-learning (MAML) framework with a self-supervised permutation task in the inner loop and a domain-adaptation bound-based outer loss, leveraging a labeled source dataset without person IDs plus unlabeled per-user data. The outer meta-objective uses a bound $\mathcal{L}_{meta}(\psi'_i) = \mathcal{L}_{gaze}(\psi'_i; \mathcal{S}) + \gamma \cdot d(\mathcal{S}, D_i^{val}; \psi'_i)$, where $d$ is estimated by joint MMD to bridge source and target distributions. Experiments on ETH-XGaze, Gaze360, GazeCapture, and MPIIGaze show consistent improvements over baselines and competitive performance with domain-adaptation methods requiring more unlabeled data, highlighting practical calibration-free personalization for gaze estimation.

Abstract

We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.

ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation

TL;DR

This work tackles efficient label-free adaptation for 3D gaze estimation, enabling a target-user-specific model from only a few unlabeled images at test time. It introduces a model-agnostic meta-learning (MAML) framework with a self-supervised permutation task in the inner loop and a domain-adaptation bound-based outer loss, leveraging a labeled source dataset without person IDs plus unlabeled per-user data. The outer meta-objective uses a bound , where is estimated by joint MMD to bridge source and target distributions. Experiments on ETH-XGaze, Gaze360, GazeCapture, and MPIIGaze show consistent improvements over baselines and competitive performance with domain-adaptation methods requiring more unlabeled data, highlighting practical calibration-free personalization for gaze estimation.

Abstract

We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.
Paper Structure (13 sections, 7 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 7 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of our problem setup. (Left) In previous work on few-shot user adaptive gaze estimation park2019few, the model is trained on the labeled base dataset with many persons (also called tasks) during meta-training. During meta-testing, given a few labeled samples from a new person (known as support set), the model is adapted to the new person. Then the model predicts other images (called query set) from this person. (Right) In this work, we propose a new problem setting called label-free user adaptation in gaze estimation. During meta-training, we have a labeled source dataset without person IDs. We also have some unlabeled data with person IDs. During met-testing, we are provided very few unlabeled images ($\le$5 ) from a new target person. Our goal is to get a gaze model specifically adapted to this target person. Note that our problem setup does not require any labeled images from the target person for adaptation. Our problem setting is closer to real-world scenarios.
  • Figure 2: Overview of our approach. Our training data consist of a source dataset $\mathcal{S}$ annotated with gaze labels but without person IDs. We also have an unlabeled subject-specific dataset, where images are annotated with person IDs but without gaze labels. Each subject corresponds to a "task" in meta-learning. For each task $i$, we construct its support set $D^{tr}_i$ and its query set $D^{val}_i$, both of which consist of unlabeled images. In the inner loop of meta-training, we use a self-supervised auxiliary task (permutation prediction) defined on the support set $D^{tr}_i$ to update the model $\psi$ to a user-adapted model $\psi'$. Since the query set $D^{val}_i$ is unlabeled, we cannot directly compute a supervised loss of $\psi'$ on $D^{val}_i$. Instead, we use a domain adaptation loss defined on $\mathcal{S}$ and $D^{val}_i$ as an upper bound approximation to the supervised loss on $D^{val}_i$. This domain adaptation loss is used for the outer loop of meta-training. After meta-training, the model has learned to effectively adapt to a new subject using only a few unlabeled images. During meta-testing, we adapt the model to a new subject and use the adapted model for inference.
  • Figure 3: The flow of the proposed method.
  • Figure 4: Performance varies with different values for balance weight $\gamma$ and meta-batch size. We show results of $D_E \rightarrow D_M$ and $D_G \rightarrow D_M$ for different values of $\gamma$ (a) and $D_G \rightarrow D_M$ for different numbers of meta-batch size (b).