Identifying Representations for Intervention Extrapolation

Sorawit Saengkyongam; Elan Rosenfeld; Pradeep Ravikumar; Niklas Pfister; Jonas Peters

Identifying Representations for Intervention Extrapolation

Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas Pfister, Jonas Peters

TL;DR

The paper tackles intervention extrapolation, i.e., predicting the effect of unseen interventions on an outcome when latent factors mediate the process. It introduces Rep4Ex, a two-stage framework that first learns affine-identifiable latent representations via a linear invariance constraint on an auto-encoder and then applies a control-function-based estimator to compute $\mathbb{E}[Y|\text{do}(A=a^*)]$ for unseen $a^*$. Theoretical results show that $g_0^{-1}$ is identifiable up to an affine transform under exogeneity of $A$ and a linear effect of $A$ on $Z$, and that the nonlinear components involved in predicting $Y$ under interventions can be recovered within this affine equivalence class. Empirically, synthetic experiments demonstrate successful recovery of the unmixing map up to affine transformations and strong extrapolation performance for unseen interventions, approaching oracle performance and proving practical viability of the method for robust causal generalization.

Abstract

The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict how interventions on A that lie outside the training support of A affect Y. Here, extrapolation becomes possible if the effect of A on Z is linear and the residual when regressing Z on A has full support. As Z is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call Rep4Ex: we aim to map the observed features X into a subspace that allows for non-linear extrapolation in A. We show that the hidden representation is identifiable up to an affine transformation in Z-space, which is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of A on Z. Based on this insight, we propose a method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through synthetic experiments and show that our approach succeeds in predicting the effects of unseen interventions.

Identifying Representations for Intervention Extrapolation

TL;DR

for unseen

. Theoretical results show that

is identifiable up to an affine transform under exogeneity of

and a linear effect of

, and that the nonlinear components involved in predicting

under interventions can be recovered within this affine equivalence class. Empirically, synthetic experiments demonstrate successful recovery of the unmixing map up to affine transformations and strong extrapolation performance for unseen interventions, approaching oracle performance and proving practical viability of the method for robust causal generalization.

Abstract

Paper Structure (32 sections, 7 theorems, 66 equations, 9 figures, 2 algorithms)

This paper contains 32 sections, 7 theorems, 66 equations, 9 figures, 2 algorithms.

Introduction
Relation to existing work
Intervention extrapolation with observed $Z$
Intervention extrapolation via identifiable representations
Identifying $M_{\phi}$, $q_{\phi}$ and $V_{\phi}$
Identifying $\ell \circ \kappa^{-1}_{\phi}$
Identification of the unmixing function $g^{-1}_0$
A method for tackling Rep4Ex
First-stage: auto-encoder with MMR regularization
Second-stage: control function approach
Experiments
Identifying the unmixing function $g^{-1}_0$
Predicting previously unseen interventions
One-dimensional $A$.
Multi-dimensional $A$.
...and 17 more sections

Key Result

Proposition 1

There exist SCMs $\mathcal{S}_1$ and $\mathcal{S}_2$ of the form eq:scm_knownZ that satisfy all of the following conditions

Figures (9)

Figure 1: In this paper, we consider the goal of intervention extrapolation, see (b). We are given training data (yellow) that cover only a limited range of possible values of $A$. During test time (grey), we would like to predict $\mathop{\mathrm{\mathbb{E}}}\nolimits[Y | \mathop{\mathrm{do}}\nolimits(A=a^*)]$ for previously unseen values of $a^*$. The function $a^* \mapsto \mathop{\mathrm{\mathbb{E}}}\nolimits[Y | \mathop{\mathrm{do}}\nolimits(A=a^*)]$ (red) can be non-linear in $a^*$. We argue in Section \ref{['sec:fully_observe_Z']} how this can be achieved using control functions if the data follow a structure like in (a) and $Z$ is observed. We show in Section \ref{['sec:cf_hiddenZ']} that, under suitable assumptions, the problem is still solvable if we first have to reconstruct the hidden representation $Z$ (up to a transformation) from $X$. The representation is used to predict $\mathop{\mathrm{\mathbb{E}}}\nolimits[Y | \mathop{\mathrm{do}}\nolimits(A=a^*)]$, so we learn a representation for intervention extrapolation (Rep4Ex).
Figure 2: R-squared values for different methods as the intervention strength ($\alpha$) increases. Each point represents an average over 20 repetitions, and the error bar indicates its 95% confidence interval. AE-MMR yields an R-squared close to 1 as $\alpha$ increases, indicating its ability to aff-identify $g^{-1}_0$, while the two baseline methods yield significantly lower R-squared values.
Figure 3: Different estimations of the target of inference $\mathop{\mathrm{\mathbb{E}}}\nolimits[Y| \mathop{\mathrm{do}}\nolimits(A\coloneqq\cdot)]$ as the training support $\gamma$ increases. The error bars represent the 95% confidence intervals over 10 repetitions. The training points displayed are subsampled for the purpose of visualization. Rep4Ex-CF demonstrates the ability to extrapolate beyond the training support, achieving nearly perfect extrapolation when $\gamma = 1.2$. In contrast, the baseline MLP shows clear limitations in its ability to extrapolate.
Figure 4: MSEs of different methods for three dimensionalities of $A$. The box plots illustrate the distribution of MSEs based on 10 repetitions. Rep4Ex-CF yields substantially lower MSEs in comparison to the baseline MLP. Furthermore, the MSEs achieved by Rep4Ex-CF are comparable to those of Rep4Ex-CF-Oracle, indicating the effectiveness of the representation learning stage.
Figure 5:
...and 4 more figures

Theorems & Definitions (17)

Proposition 1: Regressing $Y$ on $A$ does not suffice
Definition 2: Affine identifiability
Proposition 3: Equivalent definition of affine identifiability
Theorem 4
Definition 5: Linear invariance
Theorem 6
Remark 7
Lemma 8
proof
Lemma 9
...and 7 more

Identifying Representations for Intervention Extrapolation

TL;DR

Abstract

Identifying Representations for Intervention Extrapolation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (17)