Table of Contents
Fetching ...

A Causal Inspired Early-Branching Structure for Domain Generalization

Liang Chen, Yong Zhang, Yibing Song, Zhen Zhang, Lingqiao Liu

TL;DR

This paper tackles domain generalization by casting DG through a causal lens, separating semantic (causal) features from domain-specific (non-causal) features using a HSIC-based marginal independence constraint. It introduces two complements to the basic framework: an early-branching network architecture to avoid entanglement and a random domain sampling (RDS) augmentation to enforce invariance of the causal feature across domains. The approach is validated on DomainBed and Wilds benchmarks, showing competitive or superior performance over ERM and several DG baselines, with ablations highlighting the importance of each component. The work also provides identifiability arguments for the causal feature under sufficient augmentation and consistency constraints, offering practical guidance for robust domain generalization. Overall, the combination of causal reasoning, architectural design, and augmentation yields improved generalization to unseen domains, with broad implications for reliable deployment in shifting environments.

Abstract

Learning domain-invariant semantic representations is crucial for achieving domain generalization (DG), where a model is required to perform well on unseen target domains. One critical challenge is that standard training often results in entangled semantic and domain-specific features. Previous works suggest formulating the problem from a causal perspective and solving the entanglement problem by enforcing marginal independence between the causal (\ie semantic) and non-causal (\ie domain-specific) features. Despite its simplicity, the basic marginal independent-based idea alone may be insufficient to identify the causal feature. By d-separation, we observe that the causal feature can be further characterized by being independent of the domain conditioned on the object, and we propose the following two strategies as complements for the basic framework. First, the observation implicitly implies that for the same object, the causal feature should not be associated with the non-causal feature, revealing that the common practice of obtaining the two features with a shared base feature extractor and two lightweight prediction heads might be inappropriate. To meet the constraint, we propose a simple early-branching structure, where the causal and non-causal feature obtaining branches share the first few blocks while diverging thereafter, for better structure design; Second, the observation implies that the causal feature remains invariant across different domains for the same object. To this end, we suggest that augmentation should be incorporated into the framework to better characterize the causal feature, and we further suggest an effective random domain sampling scheme to fulfill the task. Theoretical and experimental results show that the two strategies are beneficial for the basic marginal independent-based framework. Code is available at \url{https://github.com/liangchen527/CausEB}.

A Causal Inspired Early-Branching Structure for Domain Generalization

TL;DR

This paper tackles domain generalization by casting DG through a causal lens, separating semantic (causal) features from domain-specific (non-causal) features using a HSIC-based marginal independence constraint. It introduces two complements to the basic framework: an early-branching network architecture to avoid entanglement and a random domain sampling (RDS) augmentation to enforce invariance of the causal feature across domains. The approach is validated on DomainBed and Wilds benchmarks, showing competitive or superior performance over ERM and several DG baselines, with ablations highlighting the importance of each component. The work also provides identifiability arguments for the causal feature under sufficient augmentation and consistency constraints, offering practical guidance for robust domain generalization. Overall, the combination of causal reasoning, architectural design, and augmentation yields improved generalization to unseen domains, with broad implications for reliable deployment in shifting environments.

Abstract

Learning domain-invariant semantic representations is crucial for achieving domain generalization (DG), where a model is required to perform well on unseen target domains. One critical challenge is that standard training often results in entangled semantic and domain-specific features. Previous works suggest formulating the problem from a causal perspective and solving the entanglement problem by enforcing marginal independence between the causal (\ie semantic) and non-causal (\ie domain-specific) features. Despite its simplicity, the basic marginal independent-based idea alone may be insufficient to identify the causal feature. By d-separation, we observe that the causal feature can be further characterized by being independent of the domain conditioned on the object, and we propose the following two strategies as complements for the basic framework. First, the observation implicitly implies that for the same object, the causal feature should not be associated with the non-causal feature, revealing that the common practice of obtaining the two features with a shared base feature extractor and two lightweight prediction heads might be inappropriate. To meet the constraint, we propose a simple early-branching structure, where the causal and non-causal feature obtaining branches share the first few blocks while diverging thereafter, for better structure design; Second, the observation implies that the causal feature remains invariant across different domains for the same object. To this end, we suggest that augmentation should be incorporated into the framework to better characterize the causal feature, and we further suggest an effective random domain sampling scheme to fulfill the task. Theoretical and experimental results show that the two strategies are beneficial for the basic marginal independent-based framework. Code is available at \url{https://github.com/liangchen527/CausEB}.
Paper Structure (29 sections, 1 theorem, 9 equations, 7 figures, 14 tables)

This paper contains 29 sections, 1 theorem, 9 equations, 7 figures, 14 tables.

Key Result

Proposition 1

$X_o$ and $X_d$ from the same object should not have any association, indicating that when designing the network, the branch used for extracting the causal feature should not depend on the non-causal one.

Figures (7)

  • Figure 1: Structural causal models for the image generation process. Observed variables are shaded. Solid arrows represent causal relations. The dashed line denotes there is entanglement between the two variables. We consider images to be generated by interventions on the "Object" and "Domain" variables which causes the causal semantic features $X_o$ and non-causal domain-specific features $X_d$.
  • Figure 2: A glimpse of the classical dual-branch network for modeling independence between causal and non-causal features ganin2016domainalbuquerque2019generalizingatzmon2020causalchen2021style, where the two branches share a same base feature extractor (i.e.$\theta_b$) with two light-weight prediction heads. Here the upper branch is used to extract the causal features (i.e.$F_o(x) = \theta_b (\theta_o (x))$), and the lower branch for extracting the non-causal features (i.e.$F_d(x) = \theta_b (\theta_d (x))$). The independent constraint is enforced between $F_o(x)$ and $F_d(x)$. $H_o$ and $H_d$ are classifiers for the corresponding branches. Only the upper branch is utilized during inference.
  • Figure 3: 2D t-SNE visualizations of target semantic representations from the ERM model and our basic form. The PACS dataset li2017deeper is used with art as the unseen target domain. The seven clusters in (a) and (b) denote the corresponding classes. The domain information is more obvious in the target features from the ERM model, indicating that ERM tends to learn entangled domain-specific and semantic features. In comparison, using the independent constraint can better disentangle the two features, resulting in less domain information in the target features.
  • Figure 4: An example diagram of the early branching structure upon the ResNet backbone he2016deep. Here '$bck$' indicates the four layers in the ResNet backbone. Similar to that in Fig. \ref{['fig pipeline']}, the top and bottom branches are for the semantic and domain estimation tasks, respectively.
  • Figure 5: Overview of our augmentation strategy. We wish the augmentation operator $\mathcal{A}$ to change the non-causal domain information of a sample while keeping the causal feature unchanged.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof