Table of Contents
Fetching ...

Causal Invariance Learning via Efficient Nonconvex Optimization

Zhenyu Wang, Yifan Hu, Peter Bühlmann, Zijian Guo

TL;DR

This work addresses learning the direct causal drivers of an outcome from multi-environment observational data by enforcing invariance of the causal outcome model. It introduces NegDRO, a continuous nonconvex minimax formulation allowing negative weights to enforce risk invariance across environments, which avoids combinatorial subset searches. Under additive interventions, it derives concrete identification conditions ensuring the invariant model recovers the causal model, proves a benign optimization landscape where stationary points are near the true causal predictor, and provides a gradient-based algorithm with non-asymptotic convergence guarantees. It further shows that even with limited additive interventions, NegDRO can identify $\beta^*$ and outperforms existing invariant-learning methods, with strong scalability to high-dimensional covariates. Overall, the approach offers a theoretically grounded, computationally efficient path to causal discovery in heterogeneous observational data, with practical implications for fields like marketing and epidemiology.

Abstract

Identifying the causal relationship among variables from observational data is an important yet challenging task. This work focuses on identifying the direct causes of an outcome and estimating their magnitude, i.e., learning the causal outcome model. Data from multiple environments provide valuable opportunities to uncover causality by exploiting the invariance principle that the causal outcome model holds across heterogeneous environments. Based on the invariance principle, we propose the Negative Weighted Distributionally Robust Optimization (NegDRO) framework to learn an invariant prediction model. NegDRO minimizes the worst-case combination of risks across multiple environments and enforces invariance by allowing potential negative weights. Under the additive interventions regime, we establish three major contributions: (i) On the statistical side, we provide sufficient and nearly necessary identification conditions under which the invariant prediction model coincides with the causal outcome model; (ii) On the optimization side, despite the nonconvexity of NegDRO, we establish its benign optimization landscape, where all stationary points lie close to the true causal outcome model; (iii) On the computational side, we develop a gradient-based algorithm that provably converges to the causal outcome model, with non-asymptotic convergence rates in both sample size and gradient-descent iterations. In particular, our method avoids exhaustive combinatorial searches over exponentially many subsets of covariates found in the literature, ensuring scalability even when the dimension of the covariates is large. To our knowledge, this is the first causal invariance learning method that finds the approximate global optimality for a nonconvex optimization problem efficiently.

Causal Invariance Learning via Efficient Nonconvex Optimization

TL;DR

This work addresses learning the direct causal drivers of an outcome from multi-environment observational data by enforcing invariance of the causal outcome model. It introduces NegDRO, a continuous nonconvex minimax formulation allowing negative weights to enforce risk invariance across environments, which avoids combinatorial subset searches. Under additive interventions, it derives concrete identification conditions ensuring the invariant model recovers the causal model, proves a benign optimization landscape where stationary points are near the true causal predictor, and provides a gradient-based algorithm with non-asymptotic convergence guarantees. It further shows that even with limited additive interventions, NegDRO can identify and outperforms existing invariant-learning methods, with strong scalability to high-dimensional covariates. Overall, the approach offers a theoretically grounded, computationally efficient path to causal discovery in heterogeneous observational data, with practical implications for fields like marketing and epidemiology.

Abstract

Identifying the causal relationship among variables from observational data is an important yet challenging task. This work focuses on identifying the direct causes of an outcome and estimating their magnitude, i.e., learning the causal outcome model. Data from multiple environments provide valuable opportunities to uncover causality by exploiting the invariance principle that the causal outcome model holds across heterogeneous environments. Based on the invariance principle, we propose the Negative Weighted Distributionally Robust Optimization (NegDRO) framework to learn an invariant prediction model. NegDRO minimizes the worst-case combination of risks across multiple environments and enforces invariance by allowing potential negative weights. Under the additive interventions regime, we establish three major contributions: (i) On the statistical side, we provide sufficient and nearly necessary identification conditions under which the invariant prediction model coincides with the causal outcome model; (ii) On the optimization side, despite the nonconvexity of NegDRO, we establish its benign optimization landscape, where all stationary points lie close to the true causal outcome model; (iii) On the computational side, we develop a gradient-based algorithm that provably converges to the causal outcome model, with non-asymptotic convergence rates in both sample size and gradient-descent iterations. In particular, our method avoids exhaustive combinatorial searches over exponentially many subsets of covariates found in the literature, ensuring scalability even when the dimension of the covariates is large. To our knowledge, this is the first causal invariance learning method that finds the approximate global optimality for a nonconvex optimization problem efficiently.

Paper Structure

This paper contains 76 sections, 26 theorems, 396 equations, 11 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Under the additive intervention regime, suppose that Condition cond: strict positive holds. Then the causal outcome model $\beta^*$ is the unique risk-invariant prediction model such that $\mathcal{B}_{\rm inv} = \{\beta^*\}$.

Figures (11)

  • Figure 1: Illustration of plausible causal structures, where orange nodes denote the direct causes for $Y$.
  • Figure 2: Illustration of Invariance and Causality.
  • Figure 3: Comparison of Conditions \ref{['cond: relaxed minimization']} and \ref{['cond: strict positive - A']}. Orange nodes indicate intervened covariates. Condition \ref{['cond: relaxed minimization']} requires interventions on the covariate directly affected by the outcome ($X_2$), whereas Condition \ref{['cond: strict positive - A']} requires interventions on all covariates $(X_1, X_2, X_3)$.
  • Figure 4: Illustration of Example \ref{['eg: illus-eg2']} under the three additive intervention regimes. Solid orange nodes represent covariates with large-scale interventions, hollow orange nodes represent covariates with small-scale interventions, and white nodes indicate covariates with no interventions.
  • Figure 5: Comparison of NegDRO with CausalDantzig and DRIG in terms of $\ell_2$ distance of estimators computed by these methods to the causal outcome model $\beta^*$. Sample size $n_e=10,000$ for $e\in \mathcal{E}$. For NegDRO and DRIG, we vary the regularization parameter $\gamma$ in the range of $[0,60]$, while CausalDantzig does not require it. The results are averaged over 200 simulations.
  • ...and 6 more figures

Theorems & Definitions (36)

  • Example 1
  • Definition 1: Structural Equation Models
  • Definition 2: Stationary Point
  • Example 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • Proposition 2: Model Error of NegDRO
  • Theorem 4: Landscape of NegDRO
  • ...and 26 more