Table of Contents
Fetching ...

Identifying Linearly-Mixed Causal Representations from Multi-Node Interventions

Simon Bing, Urmi Ninad, Jonas Wahl, Jakob Runge

TL;DR

The paper tackles identifiability in causal representation learning when latent variables are linearly mixed and subjected to multi-node interventions across environments. It introduces a variance-density sparsity criterion on the trace left by interventions and proves that, under diverse interventional coverage and injective mixing, latent factors are identifiable up to permutation and rescaling. A practical SGD-based algorithm implements this idea via a five-term loss that enforces variance sparsity, environment- and dimension-wise variance constraints, diagonal sparsity, and norm stabilization, enabling recovery of the ground-truth causal factors in synthetic data. Empirically, the method achieves near-perfect recovery across varying latent dimensions, graph densities, and even nonlinear SCMs, demonstrating robustness and potential applicability as a modular component in causal representation pipelines.

Abstract

The task of inferring high-level causal variables from low-level observations, commonly referred to as causal representation learning, is fundamentally underconstrained. As such, recent works to address this problem focus on various assumptions that lead to identifiability of the underlying latent causal variables. A large corpus of these preceding approaches consider multi-environment data collected under different interventions on the causal model. What is common to virtually all of these works is the restrictive assumption that in each environment, only a single variable is intervened on. In this work, we relax this assumption and provide the first identifiability result for causal representation learning that allows for multiple variables to be targeted by an intervention within one environment. Our approach hinges on a general assumption on the coverage and diversity of interventions across environments, which also includes the shared assumption of single-node interventions of previous works. The main idea behind our approach is to exploit the trace that interventions leave on the variance of the ground truth causal variables and regularizing for a specific notion of sparsity with respect to this trace. In addition to and inspired by our theoretical contributions, we present a practical algorithm to learn causal representations from multi-node interventional data and provide empirical evidence that validates our identifiability results.

Identifying Linearly-Mixed Causal Representations from Multi-Node Interventions

TL;DR

The paper tackles identifiability in causal representation learning when latent variables are linearly mixed and subjected to multi-node interventions across environments. It introduces a variance-density sparsity criterion on the trace left by interventions and proves that, under diverse interventional coverage and injective mixing, latent factors are identifiable up to permutation and rescaling. A practical SGD-based algorithm implements this idea via a five-term loss that enforces variance sparsity, environment- and dimension-wise variance constraints, diagonal sparsity, and norm stabilization, enabling recovery of the ground-truth causal factors in synthetic data. Empirically, the method achieves near-perfect recovery across varying latent dimensions, graph densities, and even nonlinear SCMs, demonstrating robustness and potential applicability as a modular component in causal representation pipelines.

Abstract

The task of inferring high-level causal variables from low-level observations, commonly referred to as causal representation learning, is fundamentally underconstrained. As such, recent works to address this problem focus on various assumptions that lead to identifiability of the underlying latent causal variables. A large corpus of these preceding approaches consider multi-environment data collected under different interventions on the causal model. What is common to virtually all of these works is the restrictive assumption that in each environment, only a single variable is intervened on. In this work, we relax this assumption and provide the first identifiability result for causal representation learning that allows for multiple variables to be targeted by an intervention within one environment. Our approach hinges on a general assumption on the coverage and diversity of interventions across environments, which also includes the shared assumption of single-node interventions of previous works. The main idea behind our approach is to exploit the trace that interventions leave on the variance of the ground truth causal variables and regularizing for a specific notion of sparsity with respect to this trace. In addition to and inspired by our theoretical contributions, we present a practical algorithm to learn causal representations from multi-node interventional data and provide empirical evidence that validates our identifiability results.
Paper Structure (35 sections, 5 theorems, 46 equations, 4 figures, 1 table)

This paper contains 35 sections, 5 theorems, 46 equations, 4 figures, 1 table.

Key Result

lemma 1

[lemma]lem:var [assumption]ass:var For all invertible $\mathbf{L} \in \mathbb{R}^{d \times d}$,

Figures (4)

  • Figure 1: Comparison of samples drawn from the SCM of \ref{['example']} under three different interventions between ($\bm{a}$) the ground truth representation and ($\bm{b}$) a mixed representation. Notice that the density of variables with nonzero variance is lower in each environment in the ground truth representation than in the mixed case. We exploit the principle that the ground truth is more sparse in terms of nonzero variance dimensions to achieve identifiability of causal representations using multi-node interventional data.
  • Figure 2: We report the mean MCC score across various experimental conditions, over five random seeds. Error bars or shaded regions indicate the standard error. ($\bm{a}$) Our model performs well across all considered number of latent variables $d$, even up to $d=30$. ($\bm{b}$) MCC across different probabilities of an edge being present $p$. Our method achieves near-perfect score across all settings, indicating that we do not implicitly rely on assumptions on the density of the underlying graph. ($\bm{c}$) MCC for different sample sizes $n$. Performance increases with sample size and saturates at $n=2\cdot10^5$.
  • Figure 3: Causal graph of SCM 1 and SCM 2.
  • Figure 4: Experiments for changing number of variables $d$ for additional mixing matrices $\mathbf{L}$. Each subfigure corresponds to a different $\mathbf{L}$. We report the mean MCC score for five random seeds with error bars indicating the standard error.

Theorems & Definitions (10)

  • definition 1: Causal Disentanglement up to Redundancies
  • definition 2: Variance density.
  • lemma 1: Non-vanishing variance under mixing.
  • theorem 2: Disentanglement via intervention sparsity.
  • lemma 2: Non-vanishing variance under mixing.
  • proof
  • lemma 3: Invertible matrices contain a permutation, lachapelle_synergies_2023
  • proof
  • theorem 5: Disentanglement via intervention sparsity.
  • proof