ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation
Yong Wu, Yang Wang, Sanqing Qu, Zhijun Li, Guang Chen
TL;DR
This work tackles efficient label-free adaptation for 3D gaze estimation, enabling a target-user-specific model from only a few unlabeled images at test time. It introduces a model-agnostic meta-learning (MAML) framework with a self-supervised permutation task in the inner loop and a domain-adaptation bound-based outer loss, leveraging a labeled source dataset without person IDs plus unlabeled per-user data. The outer meta-objective uses a bound $\mathcal{L}_{meta}(\psi'_i) = \mathcal{L}_{gaze}(\psi'_i; \mathcal{S}) + \gamma \cdot d(\mathcal{S}, D_i^{val}; \psi'_i)$, where $d$ is estimated by joint MMD to bridge source and target distributions. Experiments on ETH-XGaze, Gaze360, GazeCapture, and MPIIGaze show consistent improvements over baselines and competitive performance with domain-adaptation methods requiring more unlabeled data, highlighting practical calibration-free personalization for gaze estimation.
Abstract
We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.
