A Causal Inspired Early-Branching Structure for Domain Generalization
Liang Chen, Yong Zhang, Yibing Song, Zhen Zhang, Lingqiao Liu
TL;DR
This paper tackles domain generalization by casting DG through a causal lens, separating semantic (causal) features from domain-specific (non-causal) features using a HSIC-based marginal independence constraint. It introduces two complements to the basic framework: an early-branching network architecture to avoid entanglement and a random domain sampling (RDS) augmentation to enforce invariance of the causal feature across domains. The approach is validated on DomainBed and Wilds benchmarks, showing competitive or superior performance over ERM and several DG baselines, with ablations highlighting the importance of each component. The work also provides identifiability arguments for the causal feature under sufficient augmentation and consistency constraints, offering practical guidance for robust domain generalization. Overall, the combination of causal reasoning, architectural design, and augmentation yields improved generalization to unseen domains, with broad implications for reliable deployment in shifting environments.
Abstract
Learning domain-invariant semantic representations is crucial for achieving domain generalization (DG), where a model is required to perform well on unseen target domains. One critical challenge is that standard training often results in entangled semantic and domain-specific features. Previous works suggest formulating the problem from a causal perspective and solving the entanglement problem by enforcing marginal independence between the causal (\ie semantic) and non-causal (\ie domain-specific) features. Despite its simplicity, the basic marginal independent-based idea alone may be insufficient to identify the causal feature. By d-separation, we observe that the causal feature can be further characterized by being independent of the domain conditioned on the object, and we propose the following two strategies as complements for the basic framework. First, the observation implicitly implies that for the same object, the causal feature should not be associated with the non-causal feature, revealing that the common practice of obtaining the two features with a shared base feature extractor and two lightweight prediction heads might be inappropriate. To meet the constraint, we propose a simple early-branching structure, where the causal and non-causal feature obtaining branches share the first few blocks while diverging thereafter, for better structure design; Second, the observation implies that the causal feature remains invariant across different domains for the same object. To this end, we suggest that augmentation should be incorporated into the framework to better characterize the causal feature, and we further suggest an effective random domain sampling scheme to fulfill the task. Theoretical and experimental results show that the two strategies are beneficial for the basic marginal independent-based framework. Code is available at \url{https://github.com/liangchen527/CausEB}.
