UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
Jiawei Qin, Xucong Zhang, Yusuke Sugano
TL;DR
UniGaze tackles the persistent problem of cross-domain generalization in appearance-based gaze estimation by introducing a large-scale self-supervised pre-training strategy tailored to facial geometry. Using MAE pre-training on a diverse, normalized face dataset that blends real and synthetic sources, it learns gaze-relevant representations that transfer effectively to downstream gaze tasks. The approach yields substantial improvements across cross-dataset, leave-one-dataset-out, and joint-dataset evaluations, outperforming semantic-pretraining baselines and domain-generalization methods, particularly with ViT backbones. Critical findings include the necessity of input normalization, broad head-pose coverage, and identity diversity, as well as the value of mixed real/synthetic/novel-view data for robust gaze modeling. These results provide practical guidelines for robust gaze estimation in unconstrained, real-world applications and are accompanied by an open-source implementation.
Abstract
Despite decades of research on data collection and model architectures, current gaze estimation models encounter significant challenges in generalizing across diverse data domains. Recent advances in self-supervised pre-training have shown remarkable performances in generalization across various vision tasks. However, their effectiveness in gaze estimation remains unexplored. We propose UniGaze, for the first time, leveraging large-scale in-the-wild facial datasets for gaze estimation through self-supervised pre-training. Through systematic investigation, we clarify critical factors that are essential for effective pretraining in gaze estimation. Our experiments reveal that self-supervised approaches designed for semantic tasks fail when applied to gaze estimation, while our carefully designed pre-training pipeline consistently improves cross-domain performance. Through comprehensive experiments of challenging cross-dataset evaluation and novel protocols including leave-one-dataset-out and joint-dataset settings, we demonstrate that UniGaze significantly improves generalization across multiple data domains while minimizing reliance on costly labeled data. source code and model are available at https://github.com/ut-vision/UniGaze.
