Table of Contents
Fetching ...

Learning Causal Response Representations through Direct Effect Analysis

Homer Durand, Gherardo Varando, Gustau Camps-Valls

TL;DR

The paper addresses learning a representation of the direct causal effect of $X$ on a multivariate outcome $Y$ under conditioning on $Z$. It introduces a CIT-driven framework that projects $Y$ with $\mathbf{w}$ to maximize a CIT statistic, solving a generalized eigenvalue problem with losses $T_S$, $T_F$, and $T_D$. The authors provide theoretical guarantees linking the learned direction to a signal-to-noise ratio and Fisher information, and they derive an $F$-distribution based test (with an upper bound for the direct-effect statistic) to enable conditional independence testing. Empirically, the method recovers the direct-effect subspace in simulations and demonstrates practical climate-attribution benefits, including separating internal climate variability from forced responses and assessing multiple external forcings.

Abstract

We propose a novel approach for learning causal response representations. Our method aims to extract directions in which a multidimensional outcome is most directly caused by a treatment variable. By bridging conditional independence testing with causal representation learning, we formulate an optimisation problem that maximises the evidence against conditional independence between the treatment and outcome, given a conditioning set. This formulation employs flexible regression models tailored to specific applications, creating a versatile framework. The problem is addressed through a generalised eigenvalue decomposition. We show that, under mild assumptions, the distribution of the largest eigenvalue can be bounded by a known $F$-distribution, enabling testable conditional independence. We also provide theoretical guarantees for the optimality of the learned representation in terms of signal-to-noise ratio and Fisher information maximisation. Finally, we demonstrate the empirical effectiveness of our approach in simulation and real-world experiments. Our results underscore the utility of this framework in uncovering direct causal effects within complex, multivariate settings.

Learning Causal Response Representations through Direct Effect Analysis

TL;DR

The paper addresses learning a representation of the direct causal effect of on a multivariate outcome under conditioning on . It introduces a CIT-driven framework that projects with to maximize a CIT statistic, solving a generalized eigenvalue problem with losses , , and . The authors provide theoretical guarantees linking the learned direction to a signal-to-noise ratio and Fisher information, and they derive an -distribution based test (with an upper bound for the direct-effect statistic) to enable conditional independence testing. Empirically, the method recovers the direct-effect subspace in simulations and demonstrates practical climate-attribution benefits, including separating internal climate variability from forced responses and assessing multiple external forcings.

Abstract

We propose a novel approach for learning causal response representations. Our method aims to extract directions in which a multidimensional outcome is most directly caused by a treatment variable. By bridging conditional independence testing with causal representation learning, we formulate an optimisation problem that maximises the evidence against conditional independence between the treatment and outcome, given a conditioning set. This formulation employs flexible regression models tailored to specific applications, creating a versatile framework. The problem is addressed through a generalised eigenvalue decomposition. We show that, under mild assumptions, the distribution of the largest eigenvalue can be bounded by a known -distribution, enabling testable conditional independence. We also provide theoretical guarantees for the optimality of the learned representation in terms of signal-to-noise ratio and Fisher information maximisation. Finally, we demonstrate the empirical effectiveness of our approach in simulation and real-world experiments. Our results underscore the utility of this framework in uncovering direct causal effects within complex, multivariate settings.

Paper Structure

This paper contains 41 sections, 18 theorems, 46 equations, 15 figures, 1 table, 1 algorithm.

Key Result

Proposition 4.1

Assuming $P$ is entailed in the SCM in eq:scm, we have that $\mathbf{w}_D$ is optimal.

Figures (15)

  • Figure 1: Illustration of the linear model from Sec. \ref{['sec:example']} with $\mathbf{b} = (1,1)^\top$ and $\boldsymbol{\Sigma} = ( 4, 0 ; 0, 1/2 )$, showing the one-sigma ellipsoid for $Y^0$ and $Y^1$. For one-dimensional $X$, $Y^x$ shifts along $\mathbf{b}$, but projection along $\mathbf{b}$ is suboptimal. In contrast, projection along $\boldsymbol{\Sigma}^{-1} \mathbf{b}$ is optimal, with $(\boldsymbol{\Sigma}^{-1} \mathbf{b}, \mathbf{b}^\perp)$ forming a natural basis for the intervention space, where the first axis captures the intervention effect and the second contains no information.
  • Figure 2: Correlation between $\mathbf{w}^\top Y$ and $\phi(X)$ as $d$ increases. $T_D$ consistently outperforms all methods, recovering $\phi(X)$ as $d$ grows, provided that $\mathbf{b}$ faster than $\mathbf{\Sigma}$. See Fig. \ref{['fig:DR_noise_behavior_Noise']} for the (5, 95) percentiles.
  • Figure 3: Power of the test for $\alpha =0.05$. A detailed experiments with different values $\alpha$ is available in Fig. \ref{['fig:power_all']}
  • Figure 4: Correlation between $\mathbf{w}^\top Y$ and $\phi(X)$ as $d$ increases. $T_D$ consistently outperforms all methods, recovering $\phi(X)$ as $d$ grows, provided that $\mathbf{b}$ faster than $\mathbf{\Sigma}$. Columns are indexed by as A, B, C, D and rows by $1, 2, 3, 4$.
  • Figure 5: Experiments with different noise structure ($\mathbf{\Sigma}$ being diagonal, full rank and low rank) and scaling factors ($(u, v, w)$ as $(1/3, 1/3, 1/3)$, $(0.1, 0.1, 0.8)$, and $(0.1, 0.8, 0.1)$ for equal, Strong_N_Y and Strong_Z). Overall, learning algorithm $T_D$ performs better and tends to converge.
  • ...and 10 more figures

Theorems & Definitions (27)

  • Proposition 4.1: General optimality
  • Proposition 4.2: Optimality under isotropic noise
  • Proposition 4.3: Noise term behavior
  • Proposition 4.4: Equivalence between Fisher information and SNR
  • Corollary 4.5
  • Proposition 4.6: Distribution of $\lambda_F$ under conditional independence
  • Proposition 4.7: Upper bound on $\Lambda_D$ under conditional independence
  • Proposition A.1
  • Lemma B.1
  • proof
  • ...and 17 more