Causal Representation-Based Domain Generalization on Gaze Estimation
Younghan Kim, Kangryun Moon, Yongjun Park, Yonggyu Kim
TL;DR
CauGE tackles cross-domain gaze estimation by learning causal representations that separate domain-invariant factors from noncausal, potentially spurious cues. The framework combines an adversarial intervention scheme to extract stable features, a factorization loss to enforce modularity among causal mechanisms, and an attention-enhanced gaze predictor to address causal heterogeneity. Key contributions include introducing causality-inspired principles (stability, modularity, heterogeneity) to gaze DG, simulating domain shifts with AugMix, and demonstrating state-of-the-art performance on gaze DG benchmarks with extensive ablations and visual analyses. The approach improves robustness to unseen domains, offering a practical path toward reliable real-world gaze estimation without target-domain data during training.
Abstract
The availability of extensive datasets containing gaze information for each subject has significantly enhanced gaze estimation accuracy. However, the discrepancy between domains severely affects a model's performance explicitly trained for a particular domain. In this paper, we propose the Causal Representation-Based Domain Generalization on Gaze Estimation (CauGE) framework designed based on the general principle of causal mechanisms, which is consistent with the domain difference. We employ an adversarial training manner and an additional penalizing term to extract domain-invariant features. After extracting features, we position the attention layer to make features sufficient for inferring the actual gaze. By leveraging these modules, CauGE ensures that the neural networks learn from representations that meet the causal mechanisms' general principles. By this, CauGE generalizes across domains by extracting domain-invariant features, and spurious correlations cannot influence the model. Our method achieves state-of-the-art performance in the domain generalization on gaze estimation benchmark.
