Improving Domain Generalization on Gaze Estimation via Branch-out Auxiliary Regularization
Ruijie Zhao, Pinyan Tang, Sihui Luo
TL;DR
This work tackles domain generalization in appearance-based gaze estimation under uncontrolled conditions where illumination and identity variations degrade performance. It introduces Branch-out Auxiliary Regularization (BAR), a plug-and-play training-time framework that adds two auxiliary branches—the augmentation branch and the contrast branch—to enforce consistency and disentangle gaze-relevant features from gaze-irrelevant attributes without using target-domain data. BAR integrates with existing models via a multi-term loss, $\mathcal{L}_{total} = \mathcal{L}_{ori} + \lambda_{a}\mathcal{L}_{aug} + \lambda_{m}\mathcal{L}_{mmd} + \lambda_{c}\mathcal{L}_{con}$, with all $\lambda$ set to 1.0, and employs $\mathcal{L}_{aug}$, $\mathcal{L}_{mmd}$, and $\mathcal{L}_{con}$ to enhance invariance to environmental and identity factors. Experiments on four cross-dataset tasks demonstrate that BAR consistently surpasses baselines and state-of-the-art methods, and its plug-and-play design allows easy adoption across diverse gaze-estimation architectures, enabling more robust real-world gaze systems.
Abstract
Despite remarkable advancements, mainstream gaze estimation techniques, particularly appearance-based methods, often suffer from performance degradation in uncontrolled environments due to variations in illumination and individual facial attributes. Existing domain adaptation strategies, limited by their need for target domain samples, may fall short in real-world applications. This letter introduces Branch-out Auxiliary Regularization (BAR), an innovative method designed to boost gaze estimation's generalization capabilities without requiring direct access to target domain data. Specifically, BAR integrates two auxiliary consistency regularization branches: one that uses augmented samples to counteract environmental variations, and another that aligns gaze directions with positive source domain samples to encourage the learning of consistent gaze features. These auxiliary pathways strengthen the core network and are integrated in a smooth, plug-and-play manner, facilitating easy adaptation to various other models. Comprehensive experimental evaluations on four cross-dataset tasks demonstrate the superiority of our approach.
