Table of Contents
Fetching ...

When Shift Happens - Confounding Is to Blame

Abbavaram Gowtham Reddy, Celia Rubio-Madrigal, Rebekka Burkholz, Krikamol Muandet

TL;DR

This paper reveals that hidden confounding shifts can undermine invariance-based OOD methods and explains why standard ERM can perform well under distribution shifts. By deriving a general predictive-information decomposition, it shows that learning environment-specific input–output mappings—rather than forcing invariant predictors—can improve both ID and OOD performance. It further demonstrates that additional informative covariates acting as proxies for hidden confounders can enhance predictive informativeness and reduce concept shift, offering a principled approach to covariate selection. Across real-world and synthetic datasets, the results support the central claim that hidden confounding is prevalent and that strategies targeting conditional informativeness and environment-aware relationships yield practical robustness to distribution shifts.

Abstract

Distribution shifts introduce uncertainty that undermines the robustness and generalization capabilities of machine learning models. While conventional wisdom suggests that learning causal-invariant representations enhances robustness to such shifts, recent empirical studies present a counterintuitive finding: (i) empirical risk minimization (ERM) can rival or even outperform state-of-the-art out-of-distribution (OOD) generalization methods, and (ii) its OOD generalization performance improves when all available covariates, not just causal ones, are utilized. Drawing on both empirical and theoretical evidence, we attribute this phenomenon to hidden confounding. Shifts in hidden confounding induce changes in data distributions that violate assumptions commonly made by existing OOD generalization approaches. Under such conditions, we prove that effective generalization requires learning environment-specific relationships, rather than relying solely on invariant ones. Furthermore, we show that models augmented with proxies for hidden confounders can mitigate the challenges posed by hidden confounding shifts. These findings offer new theoretical insights and practical guidance for designing robust OOD generalization algorithms and principled covariate selection strategies.

When Shift Happens - Confounding Is to Blame

TL;DR

This paper reveals that hidden confounding shifts can undermine invariance-based OOD methods and explains why standard ERM can perform well under distribution shifts. By deriving a general predictive-information decomposition, it shows that learning environment-specific input–output mappings—rather than forcing invariant predictors—can improve both ID and OOD performance. It further demonstrates that additional informative covariates acting as proxies for hidden confounders can enhance predictive informativeness and reduce concept shift, offering a principled approach to covariate selection. Across real-world and synthetic datasets, the results support the central claim that hidden confounding is prevalent and that strategies targeting conditional informativeness and environment-aware relationships yield practical robustness to distribution shifts.

Abstract

Distribution shifts introduce uncertainty that undermines the robustness and generalization capabilities of machine learning models. While conventional wisdom suggests that learning causal-invariant representations enhances robustness to such shifts, recent empirical studies present a counterintuitive finding: (i) empirical risk minimization (ERM) can rival or even outperform state-of-the-art out-of-distribution (OOD) generalization methods, and (ii) its OOD generalization performance improves when all available covariates, not just causal ones, are utilized. Drawing on both empirical and theoretical evidence, we attribute this phenomenon to hidden confounding. Shifts in hidden confounding induce changes in data distributions that violate assumptions commonly made by existing OOD generalization approaches. Under such conditions, we prove that effective generalization requires learning environment-specific relationships, rather than relying solely on invariant ones. Furthermore, we show that models augmented with proxies for hidden confounders can mitigate the challenges posed by hidden confounding shifts. These findings offer new theoretical insights and practical guidance for designing robust OOD generalization algorithms and principled covariate selection strategies.

Paper Structure

This paper contains 15 sections, 6 theorems, 23 equations, 29 figures, 21 tables.

Key Result

Proposition 4.1

For a covariate vector $\mathbf{X}$, label $Y$, with causal structure $\mathbf{X}\leftrightarrow Y$, i.e., some covariates cause $Y$ and some covariates are caused by $Y$, environment variable $E$, a feature extractor $\phi$, and prediction $\hat{Y}$, the predictive information $I(Y; \hat{Y})$ is de where $I(\phi(\mathbf{X});Y|\hat{Y})$ is the residual information in $\phi(\mathbf{X})$ for inferri

Figures (29)

  • Figure 1: Causal graphs underlying distribution shifts.
  • Figure 2: We evaluate four linear regression (L.R.) models in an OOD setting characterized by hidden confounding shifts and minimal environment overlap (i.e., distant $\mu_e$). (i) A model trained solely on $X$ learns an incorrect relationship with $Y$, illustrating Simpson's paradox. (ii) Using environment-specific summary statistics of $X$, denoted as $E$, recovers the correct relationship but remains limited in representation power. (iii) Using an informative covariate $X_i$ for $U$ improves OOD generalization. (iv) The oracle model is trained on $X$ and $U$.
  • Figure 3: Bi-directed arrow between $\mathbf{X}$ and $Y$ indicate that some covariates of $\mathbf{X}$ can cause $Y$, and some may be caused by $Y$.
  • Figure 4: The difference $\text{conditional informativeness} - \text{residual}$ in the plots is positively correlated with the average ID and OOD test accuracy over the eight datasets shown in the table on the right.
  • Figure 5: Adding more proxy variables $\mathbf{X}_I$ of $U$ that are informative to $Y$ helps in reducing MSE, increasing conditional informativeness and feature shift while reducing concept shift.
  • ...and 24 more figures

Theorems & Definitions (12)

  • Definition 4.1: Informativeness and Conditional Informativeness
  • Definition 4.2: Variation and Invariance
  • Proposition 4.1
  • Proposition 4.2
  • Definition 4.3: Informative Covariates
  • Proposition 4.3
  • Proposition A.1
  • proof
  • Proposition A.1
  • proof
  • ...and 2 more