Table of Contents
Fetching ...

Causality-oriented robustness: exploiting general noise interventions

Xinwei Shen, Peter Bühlmann, Armeen Taeb

TL;DR

This work addresses prediction under distribution shifts by introducing Distributional Robustness via Invariant Gradients (DRIG), a causality-oriented robustness framework. DRIG regularizes ERM to align gradient behavior across environments, interpolating between in-distribution performance and causal robustness, and generalizing anchor regression to arbitrary noise interventions. The authors provide theoretical guarantees in linear SCMs, connect DRIG to causal parameters, and extend it with semi-supervised variants DRIG-A and DRIG-A+ for adapting to target distributions. They validate the approach on synthetic data, single-cell perturbation data, and ICU health records, demonstrating improved worst-case robustness and practical applicability. The results highlight a trade-off between predictive accuracy and causality, with DRIG offering a tunable, data-driven path toward robust, real-world predictions.

Abstract

Since distribution shifts are common in real-world applications, there is a pressing need to develop prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general noise interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression as a special case, and that it yields prediction models that protect against more diverse perturbations. We establish finite-sample results and extend our approach to semi-supervised domain adaptation to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell and intensive health care datasets.

Causality-oriented robustness: exploiting general noise interventions

TL;DR

This work addresses prediction under distribution shifts by introducing Distributional Robustness via Invariant Gradients (DRIG), a causality-oriented robustness framework. DRIG regularizes ERM to align gradient behavior across environments, interpolating between in-distribution performance and causal robustness, and generalizing anchor regression to arbitrary noise interventions. The authors provide theoretical guarantees in linear SCMs, connect DRIG to causal parameters, and extend it with semi-supervised variants DRIG-A and DRIG-A+ for adapting to target distributions. They validate the approach on synthetic data, single-cell perturbation data, and ICU health records, demonstrating improved worst-case robustness and practical applicability. The results highlight a trade-off between predictive accuracy and causality, with DRIG offering a tunable, data-driven path toward robust, real-world predictions.

Abstract

Since distribution shifts are common in real-world applications, there is a pressing need to develop prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general noise interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression as a special case, and that it yields prediction models that protect against more diverse perturbations. We establish finite-sample results and extend our approach to semi-supervised domain adaptation to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell and intensive health care datasets.
Paper Structure (75 sections, 23 theorems, 179 equations, 15 figures, 1 table)

This paper contains 75 sections, 23 theorems, 179 equations, 15 figures, 1 table.

Key Result

Proposition 1

Suppose the data is generated according to eqn:model_anchor. Let $A$ be discrete anchors taking values in the set $\{a^e \in \mathbb{R}^{\mathrm{dim}(A)}: e \in \mathcal{E}\}$. Suppose a reference environment $0 \in \mathcal{E}$ exists where $a^e=0$. Assuming that $\mathbb{P}(A = a^e) = \omega^e$, t

Figures (15)

  • Figure 1: (left): An example of structural shifts: environment 0 represents training environment and environments 1-3 represent possible test environments, where the shift between the training and test distributions is in a particular "direction" (here, the support of each distribution is the same); (right): Causality-oriented robustness: a trade-off between in-distribution prediction and causality using our method DRIG that exploits general additive interventions in the data. DRIG encompasses anchor regression as a special case with mean shifts only. Our extended proposals DRIG-A and DRIG-A+ provides a more flexible robustness framework.
  • Figure 2: Graphical models among covariates $X$, response variable $Y$, and latent variables $H$ ($X$ and $H$ may be multivariate): (left): interventions $E$ on all components, (right): discrete interventions $E$ and continuous interventions $A$ on all components. All these structures are allowed for DRIG.
  • Figure 3: Test MSEs for varying perturbation strengths $\alpha$. (left): perturbations on covariates only; (right): perturbations on the covariate, response, and latent variables.
  • Figure 4: Boxplots of the MSEs on 50 test environments for each method with varying $\gamma$, with the worst-case MSE shown in the dashed lines on top.
  • Figure 5: (left) The difference of test-MSE of anchor regression and group DRO with the test MSE of DRIG for all $50$ test environments. (right) Performance of DRIG-A and DRIG-A+ for varying labeled sample sizes, in comparison to test-OLS and other methods that rely only on the training data. DRIG and anchor regression use fixed $\gamma=10$. Lines represent the mean and 2.5% and 97.5% quantiles.
  • ...and 10 more figures

Theorems & Definitions (48)

  • Definition 1: Gradient invariance
  • Proposition 1
  • Proposition 2
  • Theorem 3
  • Example 1: Covariate-intervened
  • Example 2: All-intervened
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • ...and 38 more