Table of Contents
Fetching ...

Identifying General Mechanism Shifts in Linear Causal Representations

Tianyu Chen, Kevin Bello, Francesco Locatello, Bryon Aragam, Pradeep Ravikumar

TL;DR

A surprising identifiability result is provided that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes in the linear causal representation learning setting, where a linear mixing of unknown latent factors follow a linear structural causal model.

Abstract

We consider the linear causal representation learning setting where we observe a linear mixing of $d$ unknown latent factors, which follow a linear structural causal model. Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them, up to permutation and scaling, provided that we have at least $d$ environments, each of which corresponds to perfect interventions on a single latent node (factor). After this powerful result, a key open problem faced by the community has been to relax these conditions: allow for coarser than perfect single-node interventions, and allow for fewer than $d$ of them, since the number of latent factors $d$ could be very large. In this work, we consider precisely such a setting, where we allow a smaller than $d$ number of environments, and also allow for very coarse interventions that can very coarsely \textit{change the entire causal graph over the latent factors}. On the flip side, we relax what we wish to extract to simply the \textit{list of nodes that have shifted between one or more environments}. We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes. Our identifiability proof moreover is a constructive one: we explicitly provide necessary and sufficient conditions for a node to be a shifted node, and show that we can check these conditions given observed data. Our algorithm lends itself very naturally to the sample setting where instead of just interventional distributions, we are provided datasets of samples from each of these distributions. We corroborate our results on both synthetic experiments as well as an interesting psychometric dataset. The code can be found at https://github.com/TianyuCodings/iLCS.

Identifying General Mechanism Shifts in Linear Causal Representations

TL;DR

A surprising identifiability result is provided that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes in the linear causal representation learning setting, where a linear mixing of unknown latent factors follow a linear structural causal model.

Abstract

We consider the linear causal representation learning setting where we observe a linear mixing of unknown latent factors, which follow a linear structural causal model. Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them, up to permutation and scaling, provided that we have at least environments, each of which corresponds to perfect interventions on a single latent node (factor). After this powerful result, a key open problem faced by the community has been to relax these conditions: allow for coarser than perfect single-node interventions, and allow for fewer than of them, since the number of latent factors could be very large. In this work, we consider precisely such a setting, where we allow a smaller than number of environments, and also allow for very coarse interventions that can very coarsely \textit{change the entire causal graph over the latent factors}. On the flip side, we relax what we wish to extract to simply the \textit{list of nodes that have shifted between one or more environments}. We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes. Our identifiability proof moreover is a constructive one: we explicitly provide necessary and sufficient conditions for a node to be a shifted node, and show that we can check these conditions given observed data. Our algorithm lends itself very naturally to the sample setting where instead of just interventional distributions, we are provided datasets of samples from each of these distributions. We corroborate our results on both synthetic experiments as well as an interesting psychometric dataset. The code can be found at https://github.com/TianyuCodings/iLCS.

Paper Structure

This paper contains 32 sections, 8 theorems, 16 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

If every component of $\epsilon$ is independent and at most one component is Gaussian distributed, with $W$ being full column rank, then ICA can estimate $W$ up to a permutation and scaling of each column, and $\epsilon$ can be recovered for some permutation up to scaling for each component. Further where $P$ is a permutation matrix and $D$ is a diagonal matrix with diagonal entries $\pm 1$. Then,

Figures (9)

  • Figure 1: We have 5 latent variables $Z$ which in this case relate to personality concepts, and the observations $X$ represent the scores of 50 questions from a psychometric personality test. The latent variables $Z$ follow a linear SCM, while the unknown shared linear mixing is a full-rank matrix $G \in {\mathbb{R}}^{50\times 5}$. Then, for environment $k = \{\mathrm{US, UK, AU}\}$, the observables are generated through $X^{(k)} = G Z^{(k)}$. Here, ${\mathbb{P}}^{(\mathrm{US})}$ is taken as the "observational" (reference) distribution, and the distribution shifts in ${\mathbb{P}}^{(\mathrm{UK})}$ and ${\mathbb{P}}^{(\mathrm{AU})}$ are due to changes in the causal mechanisms of $\{Z_1\}$ and $\{Z_2, Z_3, Z_5\}$, respectively. Finally, the types of interventions are general; for UK, the edge $Z_4 \to Z_1$ is removed and the dashed red lines indicate changes in the edge weights to $Z_1$; for AU, $Z_2$ was intervened by removing $Z_5\to Z_2$ and adding$Z_3\to Z_2$, while the edge $Z_5\to Z_3$ was reversed, thus changing the mechanisms of $Z_3$ and $Z_5$. Thus, we aim to identify $\{Z_1\}$ and $\{Z_2, Z_3, Z_5\}$.
  • Figure 2: Illustration of the efficacy of our method in accurately identifying latent shifted nodes as the sample size increases, for ER2 graphs. In the first subplot, for a latent graph with $d = 5$ nodes, we examine scenarios with observed dimensions $p = 10, 20, 40$ and plot their corresponding F1 scores against the number of samples $n$. It is observed that the F1 score approaches 1 with a sufficiently large sample size. Detailed experimental procedures and results are discussed in Section \ref{['sec:experiment']}.
  • Figure 3: We apply an intervention to the first column of $\bm{\epsilon}$ and then use $(\widehat{M}^{male})^\dagger$ for remixing. The first row of the resulting histograms represents scores for 5 out of the 10 questions related to the Extraversion personality dimension. Subsequent rows display histograms for 5 questions from each of the other four personality dimensions, as indicated at the right end of each row. The red distribution represents the scores before the intervention on the noise, while the blue distribution corresponds to scores after the intervention. Overlapping areas are shown in purple. Notably, the intervention on the first column of $\bm{\epsilon}$ alters the distribution in the observed space, specifically affecting the scores for questions related to the Agreeableness personality dimension, whereas distributions for other dimensions remain unchanged. Consequently, we can label the first noise component as corresponding to Agreeableness.
  • Figure 4: Overview of our method: For each context $k$, given the data $\bm{X}^{(k)}$, our method involves three main steps. First, we apply ICA to each dataset to estimate $\bm{\epsilon}^{(k)}$ and $M^{(k)}$. Second, we calculate $\psi(\epsilon^{(k)})=\{\psi(\epsilon^{(k)}_1),\psi(\epsilon^{(k)}_2),\dots,\psi(\epsilon^{(k)}_d)\}$ for each noise component, sort these components in increasing order, and correspondingly arrange the rows of $M^{(k)}$. Third, we compare the sorted rows of $M^{(k)}$ to identify the shifted nodes.
  • Figure 5: Intervention on the fourth component of the noise vector and subsequent re-mixing generate a new observed space — a new score distribution. Notably, only Extraversion exhibits significant changes after intervention, leading us to label the fourth component of the noise vector (after sorting) as Extraversion.
  • ...and 4 more figures

Theorems & Definitions (17)

  • Theorem 1: Theorems 3,4 in eriksson2004identifiability
  • Remark 1
  • Definition 1: Latent Mechanism Shifts
  • Remark 2
  • Theorem 2: Identifiability
  • Remark 3
  • Proposition 1
  • Proposition 2
  • Theorem 3
  • Remark 4: Estimation of $d$.
  • ...and 7 more