Table of Contents
Fetching ...

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Patrik Reizinger, Siyuan Guo, Ferenc Huszár, Bernhard Schölkopf, Wieland Brendel

TL;DR

This work presents Identifiable Exchangeable Mechanisms (IEM), a unifying framework that links causal-structure discovery and identifiable representation learning under exchangeable but non-i.i.d. data. By introducing cause and mechanism variability, the authors relax traditional identifiability conditions and derive new results that enable unique identification of causal graphs, latent sources, and latent causal representations within a single probabilistic model. The paper connects established methods (CDF, TCL, CauCA, and CRL) as special cases of IEM, and proves duality results that show identifiability under either changing causes or changing mechanisms. The approach has the potential to foster cross-pollination between causality and representation learning and to guide practical modeling of domain shifts, interventions, and multi-environment data.

Abstract

Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed) data. We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning under the lens of exchangeability. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non--i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results. We hope this work will pave the way for further research in causal representation learning.

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

TL;DR

This work presents Identifiable Exchangeable Mechanisms (IEM), a unifying framework that links causal-structure discovery and identifiable representation learning under exchangeable but non-i.i.d. data. By introducing cause and mechanism variability, the authors relax traditional identifiability conditions and derive new results that enable unique identification of causal graphs, latent sources, and latent causal representations within a single probabilistic model. The paper connects established methods (CDF, TCL, CauCA, and CRL) as special cases of IEM, and proves duality results that show identifiability under either changing causes or changing mechanisms. The approach has the potential to foster cross-pollination between causality and representation learning and to guide practical modeling of domain shifts, interventions, and multi-environment data.

Abstract

Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed) data. We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning under the lens of exchangeability. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non--i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results. We hope this work will pave the way for further research in causal representation learning.
Paper Structure (60 sections, 16 theorems, 25 equations, 6 figures, 1 table)

This paper contains 60 sections, 16 theorems, 25 equations, 6 figures, 1 table.

Key Result

Theorem 1

Let $\{(X^n, Y^n)\}_{n \in \mathbb{N}}$ be an infinite sequence of binary random variable pairs and denote the set $\left\{1,2,\dots, n\right\}$ as $[n]$. The sequence is infinitely exchangeable, and satisfies $Y^{[n]} \perp X^{n+1} \mid X^{[n]}$ for all $n \in \mathbb{N}$ if and only if there exist

Figures (6)

  • Figure 1: *iem--A unified model for structure and representation identifiability : Here we show that exchangeable but non-i.i.d. data enables identification in key methods across *cd, *ica, and *crl. \ref{['fig:unified_unified']} shows the graphical model for iem, which subsumes *cd (\ref{['subsec:exchg_cd']}), *ica (\ref{['subsec:exchg_ica']}), and *crl (\ref{['subsec:exchg_crl']}). $S$ denotes latent, $Z$ causal, and $O$ observed variables with corresponding latent parameters $\boldsymbol{\theta}, \psi$, superscripts denote different samples. Red denotes observed/known quantities, blue stands for target quantities, and gray illustrates components that are not explicitly modeled in a particular paradigm. $\theta_i$ are latent variables controlling separate probabilistic mechanisms, indicated by dotted vertical lines. cd (\ref{['fig:unified_cd']}) corresponds to the left-most layer of iem, focusing on the study of cause-effect relationships between observed causal variables; ica (\ref{['fig:unified_ica']}) infers source variables from observations, but without causal connections in the left-most layer of iem; crl (\ref{['fig:unified_crl']}) shares the most similar structure with iem, as it has both layers, including the intermediate causal representations. See \ref{['fig:unified_large']} for an enlarged view
  • Figure 2: non--*iid conditions for bivariate cd:(a) Exchangeable non--i.i.d. dgp for both cause $P(\mathbf{X})$ and mechanism $P(\mathbf{Y}|\mathbf{X})$guo2024causal; (b): exchangeable non--i.i.d. dgp for cause $P(\mathbf{X})$ and i.i.d. dgp for mechanism $P(\mathbf{Y}|\mathbf{X})$(c): exchangeable non--i.i.d. dgp for mechanism $P(\mathbf{Y}|\mathbf{X})$ and i.i.d. dgp for cause $P(\mathbf{X})$. \ref{['thm:extendcdf']} shows that identifying the unique bivariate causal structure is possible if either the cause or the mechanism follows an exchangeable non--i.i.d. dgp
  • Figure 3: The duality of cause and mechanism variability in TCL:\ref{['lem:dual_cause_mech']} shows that the same identifiability result holds in (Left): the original TCL setting with exchangeable non--i.i.d. sources $S$ with deterministic $f$ mixing (cause variability), and the matching (Right):*iid sources $S$ with a stochastic $\hat{f}(u)$ mixing (mechanism variability)
  • Figure 4: *iem--A unified model for structure and representation identifiability : Here we show that exchangeable but non-i.i.d. data enables identification in key methods across *cd, *ica, and *crl. The graphical model in \ref{['fig:unified_unified']} shows the iem model, which subsumes *cd (\ref{['subsec:exchg_cd']}), *ica (\ref{['subsec:exchg_ica']}), and *crl (\ref{['subsec:exchg_crl']}). $S$ denotes latent, $Z$ causal, and $O$ observed variables with corresponding latent parameters $\boldsymbol{\theta}, \psi$, superscripts denote different samples. Red denotes observed/known quantities, blue stands for target quantities, and gray illustrates components that are not explicitly modeled in a particular paradigm. $\theta_i$ are latent variables controlling separate probabilistic mechanisms, indicated by dotted vertical lines. cd (\ref{['fig:unified_cd']}) corresponds to the left-most layer of iem, focusing on the study of cause-effect relationships between observed causal variables; ica (\ref{['fig:unified_ica']}) infers source variables from observations, but without causal connections in the left-most layer of iem; crl (\ref{['fig:unified_crl']}) shares the most similar structure with iem, as it has both layers, including the intermediate causal representations
  • Figure 5: We show that the richness argument in cdf guo2024causal can be realized, in the bivariate case, via either only varying the prior of the causes' parameters $\theta$ (\ref{['fig:cause_var']}) or the prior of the mechanism' parameters $\psi$ (\ref{['fig:mech_var']}). That is, it is not necessary to have rich priors for both $\theta, \psi$
  • ...and 1 more figures

Theorems & Definitions (37)

  • Definition 1: Exchangeable sequence
  • Theorem 1: Causal de Finetti guo2024causal
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • Lemma 3
  • Example 1: Duality of cause and mechanism variability for Gaussian models
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 27 more