Table of Contents
Fetching ...

Linear causal disentanglement via higher-order cumulants

Paula Leyes Carreno, Chiara Meroni, Anna Seigal

TL;DR

The paper studies identifiability in linear causal disentanglement (LCD), where observed variables are a linear mix of latent variables with causal relations. It develops a constructive approach based on coupled tensor decompositions of higher-order cumulants across multiple intervention contexts, proving that one perfect intervention per latent node suffices (and is sometimes necessary) to identify the latent DAG and parameters via a linear system; soft interventions yield only a compatibility class for the latent graph. A practical two-step algorithm recovers intervention targets, permutation, and scaling, then recovers latent parameters, with additional simplifications in the injective setting ($q\le p$). The results rely on non-Gaussianity of latent errors to enable identifiability and illustrate that complete identifiability under soft interventions is impossible in general, though transitive closures can be recovered. Overall, the work extends identifiability theory for causal representations and provides a concrete pipeline for recovering latent causal structure from interventional data.

Abstract

Linear causal disentanglement is a recent method in causal representation learning to describe a collection of observed variables via latent variables with causal dependencies between them. It can be viewed as a generalization of both independent component analysis and linear structural equation models. We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts, each given by an intervention on a latent variable. We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions, generalizing previous work to allow more latent than observed variables. We give a constructive proof that computes parameters via a coupled tensor decomposition. For soft interventions, we find the equivalence class of latent graphs and parameters that are consistent with observed data, via the study of a system of polynomial equations. Our results hold assuming the existence of non-zero higher-order cumulants, which implies non-Gaussianity of variables.

Linear causal disentanglement via higher-order cumulants

TL;DR

The paper studies identifiability in linear causal disentanglement (LCD), where observed variables are a linear mix of latent variables with causal relations. It develops a constructive approach based on coupled tensor decompositions of higher-order cumulants across multiple intervention contexts, proving that one perfect intervention per latent node suffices (and is sometimes necessary) to identify the latent DAG and parameters via a linear system; soft interventions yield only a compatibility class for the latent graph. A practical two-step algorithm recovers intervention targets, permutation, and scaling, then recovers latent parameters, with additional simplifications in the injective setting (). The results rely on non-Gaussianity of latent errors to enable identifiability and illustrate that complete identifiability under soft interventions is impossible in general, though transitive closures can be recovered. Overall, the work extends identifiability theory for causal representations and provides a concrete pipeline for recovering latent causal structure from interventional data.

Abstract

Linear causal disentanglement is a recent method in causal representation learning to describe a collection of observed variables via latent variables with causal dependencies between them. It can be viewed as a generalization of both independent component analysis and linear structural equation models. We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts, each given by an intervention on a latent variable. We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions, generalizing previous work to allow more latent than observed variables. We give a constructive proof that computes parameters via a coupled tensor decomposition. For soft interventions, we find the equivalence class of latent graphs and parameters that are consistent with observed data, via the study of a system of polynomial equations. Our results hold assuming the existence of non-zero higher-order cumulants, which implies non-Gaussianity of variables.
Paper Structure (26 sections, 20 theorems, 73 equations, 3 figures)

This paper contains 26 sections, 20 theorems, 73 equations, 3 figures.

Key Result

Theorem 1.5

Consider LCD under Assumption assumption:main with perfect interventions. Then one perfect intervention on each latent node is sufficient and, in the worst case, necessary to recover the latent DAG $\mathcal{G}$ and the parameters $F$ and $\Lambda^{(k)}$ from observations of $X^{(k)}$.

Figures (3)

  • Figure 1: A cartoon of the setup for $p=2$ observed variables and $q=3$ latent variables.
  • Figure 2: Median relative Frobenius error in the recovery of $F$ (left) and $\Lambda^{(0)}$ (right) when $p=5$. Note the logarithmic scale on the $y$-axis for all positive $y$-coordinates. The five algorithms are: (i) Covariance, the algorithm used in SSBU23:LinearCausalDisentanglementInterventions to recover the parameters from the covariance matrices of $X^{(k)}$ (blue), (ii) Tensor (general), the general algorithm with cumulants as input (orange), (iii) Matrix (general), the general algorithm with factor matrices as input (green), (iv) Tensor (injective), the injective algorithm with cumulants as input (red), and (v) Matrix (injective), the injective algorithm with factor matrices as input (purple). For DAG recovery, all methods recovered the correct DAG every time, except the general tensor method when $q \geq 6$, which had a median DAG error of 3.6 for $q=6$ and 4.1 for $q=7$.
  • Figure 3: Median relative Frobenius error in the recovery of $F$ (left) and $\Lambda^{(0)}$ (right) when $p=10$ and $q\leq p$. Note the logarithmic scale on the $y$-axis for all positive $y$-coordinates. The three algorithms are: (i) Covariance, the algorithm used in SSBU23:LinearCausalDisentanglementInterventions to recover the parameters from the covariance matrices of $X^{(k)}$ (blue), (ii) Tensor (injective), the injective algorithm with cumulants as input (red), and (iii) Matrix (injective), the injective algorithm with factor matrices as input (purple). For DAG recovery, all methods recovered the correct DAG every time.

Theorems & Definitions (52)

  • Definition 1.1
  • Remark 1.4: Benign non-identifiability
  • Theorem 1.5
  • Theorem 1.6
  • Corollary 1.7
  • Proposition 2.1
  • proof
  • Corollary 2.2
  • proof
  • Proposition 2.3
  • ...and 42 more