Table of Contents
Fetching ...

Causal Representation-Based Domain Generalization on Gaze Estimation

Younghan Kim, Kangryun Moon, Yongjun Park, Yonggyu Kim

TL;DR

CauGE tackles cross-domain gaze estimation by learning causal representations that separate domain-invariant factors from noncausal, potentially spurious cues. The framework combines an adversarial intervention scheme to extract stable features, a factorization loss to enforce modularity among causal mechanisms, and an attention-enhanced gaze predictor to address causal heterogeneity. Key contributions include introducing causality-inspired principles (stability, modularity, heterogeneity) to gaze DG, simulating domain shifts with AugMix, and demonstrating state-of-the-art performance on gaze DG benchmarks with extensive ablations and visual analyses. The approach improves robustness to unseen domains, offering a practical path toward reliable real-world gaze estimation without target-domain data during training.

Abstract

The availability of extensive datasets containing gaze information for each subject has significantly enhanced gaze estimation accuracy. However, the discrepancy between domains severely affects a model's performance explicitly trained for a particular domain. In this paper, we propose the Causal Representation-Based Domain Generalization on Gaze Estimation (CauGE) framework designed based on the general principle of causal mechanisms, which is consistent with the domain difference. We employ an adversarial training manner and an additional penalizing term to extract domain-invariant features. After extracting features, we position the attention layer to make features sufficient for inferring the actual gaze. By leveraging these modules, CauGE ensures that the neural networks learn from representations that meet the causal mechanisms' general principles. By this, CauGE generalizes across domains by extracting domain-invariant features, and spurious correlations cannot influence the model. Our method achieves state-of-the-art performance in the domain generalization on gaze estimation benchmark.

Causal Representation-Based Domain Generalization on Gaze Estimation

TL;DR

CauGE tackles cross-domain gaze estimation by learning causal representations that separate domain-invariant factors from noncausal, potentially spurious cues. The framework combines an adversarial intervention scheme to extract stable features, a factorization loss to enforce modularity among causal mechanisms, and an attention-enhanced gaze predictor to address causal heterogeneity. Key contributions include introducing causality-inspired principles (stability, modularity, heterogeneity) to gaze DG, simulating domain shifts with AugMix, and demonstrating state-of-the-art performance on gaze DG benchmarks with extensive ablations and visual analyses. The approach improves robustness to unseen domains, offering a practical path toward reliable real-world gaze estimation without target-domain data during training.

Abstract

The availability of extensive datasets containing gaze information for each subject has significantly enhanced gaze estimation accuracy. However, the discrepancy between domains severely affects a model's performance explicitly trained for a particular domain. In this paper, we propose the Causal Representation-Based Domain Generalization on Gaze Estimation (CauGE) framework designed based on the general principle of causal mechanisms, which is consistent with the domain difference. We employ an adversarial training manner and an additional penalizing term to extract domain-invariant features. After extracting features, we position the attention layer to make features sufficient for inferring the actual gaze. By leveraging these modules, CauGE ensures that the neural networks learn from representations that meet the causal mechanisms' general principles. By this, CauGE generalizes across domains by extracting domain-invariant features, and spurious correlations cannot influence the model. Our method achieves state-of-the-art performance in the domain generalization on gaze estimation benchmark.
Paper Structure (30 sections, 9 equations, 6 figures, 3 tables)

This paper contains 30 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Dotted gray circles denote non-causal factors, and colored circles denote causal factors. a) Baseline, which is vanilla training, learns from representations mixed with non-causal factors and constrained with others. b) CauGE learns from representation separated from non-causal and independent factors. Learning such representation makes the model generalize well on unseen data.
  • Figure 2: Structural Causal Model, which describes the general principle of causal mechanisms. a) If $Y$ and $X$ are correlated, then $C$ should exist which is the common cause. b) When $S$ causes $Y$, this causal relationship is expected to be invariant. c) If we change one mechanism, it should not directly impact another mechanism. d) Not all causal factors are equal; some play a much more significant role than others. (The size of the circle describes the importance of a specific factor.)
  • Figure 3: Overview of CauGE architecture. a) Simulate domain shift to capture the difference caused by the intervention. b) Force $F$ to extract domain-invariant features in adversarial intervention classification. c) Learn to extract independent causal representation with Factorization Loss. d) Strengthen representation with an attention layer and predict actual gaze angle with a gaze predictor.
  • Figure 4: Class Activation Map Visualization on varying illumination conditions. a) The baseline's class activation map is inconsistent with illumination change. b) CauGE's class activation map is consistent with illumination change.
  • Figure 5: Class Activation Map Visualization on cross-dataset condition. a) The baseline does not focus on the face region. b) CauGE focus on face region.
  • ...and 1 more figures