Table of Contents
Fetching ...

On the Identifiability of Causal Abstractions

Xiusi Li, Sékou-Oumar Kaba, Siamak Ravanbakhsh

TL;DR

This work addresses identifiability in causal representation learning when only counterfactual data pairs are available and interventions target arbitrary subsets of latent variables. It introduces a theoretical framework to identify latent causal models up to abstractions by leveraging non-descendant structures and graph quotient constructions, yielding an acyclic quotient graph and identifiable latent blocks. The key contributions include (i) proving identifiability up to an SCM abstraction determined by non-descendant intervention sets, and (ii) showing additional latents can be disentangled under singleton non-descendant conditions, all under mild assumptions like faithfulness and absolute continuity. The results provide a principled route to learn causally meaningful abstractions from weak supervision, with implications for scalable, interpretable reasoning in CRL-informed systems, while acknowledging practical limitations and future work on scalability and empirical methodology.

Abstract

Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.

On the Identifiability of Causal Abstractions

TL;DR

This work addresses identifiability in causal representation learning when only counterfactual data pairs are available and interventions target arbitrary subsets of latent variables. It introduces a theoretical framework to identify latent causal models up to abstractions by leveraging non-descendant structures and graph quotient constructions, yielding an acyclic quotient graph and identifiable latent blocks. The key contributions include (i) proving identifiability up to an SCM abstraction determined by non-descendant intervention sets, and (ii) showing additional latents can be disentangled under singleton non-descendant conditions, all under mild assumptions like faithfulness and absolute continuity. The results provide a principled route to learn causally meaningful abstractions from weak supervision, with implications for scalable, interpretable reasoning in CRL-informed systems, while acknowledging practical limitations and future work on scalability and empirical methodology.

Abstract

Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.

Paper Structure

This paper contains 32 sections, 9 theorems, 44 equations, 4 figures.

Key Result

Lemma 2.1

Given latent causal models parametrized by $\theta^\star$ and $\theta$ as defined in subsec: dgp, suppose $\theta \sim_{\text{FL}} \theta^\star$ with respect to the decompositions $\bigoplus_{j = 1}^n \mathcal{Z}_j$ and $\bigoplus_{i = 1}^n \mathcal{Z}^\star_i$. Then for all $i \in \{1, \cdots, n\}$

Figures (4)

  • Figure 1: Data Generating Process for a Latent Causal Model. On the left we have the unintervened structural causal model generating samples of the pre-intervention latent $\mathbf{z}$, while on the right we have a mixture of intervened structural causal models generating samples of the post-intervention latent $\tilde{\mathbf{z}}$, for a collection of intervention targets $\mathcal{I} = \{\{3\}, \{3, 4\}, \{4, 5\} \}$, as denoted by the subsets of red vertices; the corresponding non-descendant sets of the intervention targets are denoted by the subsets of blue vertices.
  • Figure 2: SCM homomorphism, as defined by a graph homomorphism $\phi$ that maps nodes in $\mathcal{G}'$ to nodes in $\mathcal{G}$ of the same colour (e.g. $\phi(2) = \phi(3) = 3$), as well as a set of invertible measurable functions which ensure that the latent variables represented by nodes in the image of $\phi$, which in this case are $\mathbf{z}'_1$, $\mathbf{z}'_3$ and $\mathbf{z}'_4$, have equivalent distributions to their counterparts after marginalization on $\mathbf{z}'_2$.
  • Figure 3: SCM Abstractions. The white arrows represent surjective SCM homomorphisms, which map causal models operating at higher granularity levels to lower granularity levels, such that the model at the centre is an abstraction of all the other models.
  • Figure 4: Identifiable SCM abstraction with graphical structure as shown on the left and $\mathcal{I}^\star = \{\{3\}, \{3, 4\}, \{4, 5\} \}$ as its family of intervention targets.

Theorems & Definitions (28)

  • Definition 2.1: Latent disentanglement
  • Definition 2.2: Full latent disentanglement
  • Lemma 2.1
  • Definition 2.3: SCM isomorphism
  • Lemma 2.2
  • Definition 2.4: Latent abstraction
  • Definition 2.5: Full latent abstraction
  • Lemma 2.3
  • Definition 2.6: SCM homomorphism
  • Definition 2.7: SCM abstraction
  • ...and 18 more