Causally Inspired Regularization Enables Domain General Representations

Olawale Salaudeen; Sanmi Koyejo

Causally Inspired Regularization Enables Domain General Representations

Olawale Salaudeen, Sanmi Koyejo

TL;DR

This work tackles domain generalization under distribution shifts by leveraging causal graphs to identify domain-general representations that remain stable across domains. It introduces Total Information Criterion (TIC) and a two-branch representation with a Hilbert Schmidt independence penalty (TCRI) to separate domain-general and domain-specific information without needing direct observations of spurious features. The method demonstrates superior average and worst-domain transfer performance on semi-synthetic and real-world benchmarks (ColoredMNIST, Spurrious PACS, Terra Incognita) compared with IRM, GroupDRO, VREx, and IB-based baselines, including ablations that highlight the importance of TIC. By enabling robust domain-general predictors in settings with complex spurious correlations, the approach has potential implications for safety-critical and fairness-sensitive applications where transfer across unseen domains is essential.

Abstract

Given a causal graph representing the data-generating process shared across different domains/distributions, enforcing sufficient graph-implied conditional independencies can identify domain-general (non-spurious) feature representations. For the standard input-output predictive setting, we categorize the set of graphs considered in the literature into two distinct groups: (i) those in which the empirical risk minimizer across training domains gives domain-general representations and (ii) those where it does not. For the latter case (ii), we propose a novel framework with regularizations, which we demonstrate are sufficient for identifying domain-general feature representations without a priori knowledge (or proxies) of the spurious features. Empirically, our proposed method is effective for both (semi) synthetic and real-world data, outperforming other state-of-the-art methods in average and worst-domain transfer accuracy.

Causally Inspired Regularization Enables Domain General Representations

TL;DR

Abstract

Paper Structure (41 sections, 1 theorem, 15 equations, 5 figures, 20 tables)

This paper contains 41 sections, 1 theorem, 15 equations, 5 figures, 20 tables.

Introduction
Contributions
Notation:
Related Work
Causality and Domain Generalization
Valid DAGs.
Conditional independencies implied by identified DAGs (Figure \ref{['fig:graph_other']}).
Domain generalization with conditional independencies.
Sufficiency of ERM under Fully Informative Invariant Features.
Easy vs. hard DAGs imply the generality of TCRI.
Proposed Learning Framework
Learning Objective:
Experiments
Semisynthetic and Real-World Datasets
Algorithms:
...and 26 more sections

Key Result

Proposition 4.5

Assume that $\Phi_{\text{dg}}(X)$ and $\Phi_{\text{spu}}(X)$ are correlated with $Y$. Given Assumptions assum:generative-assum:corr and a representation $\Phi = \Phi_{\text{dg}} \oplus \Phi_{\text{spu}}$ that satisfies TIC, $\Phi_{\text{dg}}(X) = Z_{\text{dg}} \iff$$\Phi$ satisfies TCRI. (see Append

Figures (5)

Figure 1:
Figure 2: Generative Processes. Graphical models depicting the structure of possible data-generating processes -- shaded nodes indicate observed variables. $X$ represents the observed features, $Y$ represents observed targets, and $e$ represents domain influences (domain indexes in practice). There is an explicit separation of domain-general $Z_{\text{dg}}$ and domain-specific $Z_{\text{spu}}$ features; they are combined to generate observed $X$. Dashed edges indicate the possibility of an edge.
Figure 3:
Figure 4:
Figure 5: Generative Processes. Graphical model depicting the structure of our data-generating process - shaded nodes indicate observed variables. $X$ represents the observed features, $Y$ represents observed targets, and $e$ represents domain influences. There is an explicit separation of domain-general $Z_{\text{dg}}$ and domain-specific $Z_{\text{spu}}$ features combined to generate observed $X$. Dashed edges indicate the possibility of an edge.

Theorems & Definitions (5)

Definition 4.3: Total Information Criterion (TIC)
Definition 4.4: Target Conditioned Representation Independence
Proposition 4.5
proof
proof

Causally Inspired Regularization Enables Domain General Representations

TL;DR

Abstract

Causally Inspired Regularization Enables Domain General Representations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)