Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Patrik Reizinger; Siyuan Guo; Ferenc Huszár; Bernhard Schölkopf; Wieland Brendel

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Patrik Reizinger, Siyuan Guo, Ferenc Huszár, Bernhard Schölkopf, Wieland Brendel

TL;DR

This work presents Identifiable Exchangeable Mechanisms (IEM), a unifying framework that links causal-structure discovery and identifiable representation learning under exchangeable but non-i.i.d. data. By introducing cause and mechanism variability, the authors relax traditional identifiability conditions and derive new results that enable unique identification of causal graphs, latent sources, and latent causal representations within a single probabilistic model. The paper connects established methods (CDF, TCL, CauCA, and CRL) as special cases of IEM, and proves duality results that show identifiability under either changing causes or changing mechanisms. The approach has the potential to foster cross-pollination between causality and representation learning and to guide practical modeling of domain shifts, interventions, and multi-environment data.

Abstract

Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed) data. We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning under the lens of exchangeability. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non--i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results. We hope this work will pave the way for further research in causal representation learning.

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

TL;DR

Abstract

Paper Structure (60 sections, 16 theorems, 25 equations, 6 figures, 1 table)

This paper contains 60 sections, 16 theorems, 25 equations, 6 figures, 1 table.

Introduction
Preliminaries
Notation.
*cdf and Exchangeability.
Causality.
*iem: A unifying framework for structural and representational identifiability
*iem
A probabilistic model for iem.
An intuition for iem.
Case study: Identifiable Latent Neural Causal Models liu_identifiable_2024 in the unified model.
Exchangeability in *cd: Extending *cdf
A probabilistic model for cd.
Case study: cdf in the unified model.
Relaxing cdf: cause and mechanism variability.
Exchangeability in representation learning
...and 45 more sections

Key Result

Theorem 1

Let $\{(X^n, Y^n)\}_{n \in \mathbb{N}}$ be an infinite sequence of binary random variable pairs and denote the set $\left\{1,2,\dots, n\right\}$ as $[n]$. The sequence is infinitely exchangeable, and satisfies $Y^{[n]} \perp X^{n+1} \mid X^{[n]}$ for all $n \in \mathbb{N}$ if and only if there exist

Figures (6)

Figure 1: *iem--A unified model for structure and representation identifiability : Here we show that exchangeable but non-i.i.d. data enables identification in key methods across *cd, *ica, and *crl. \ref{['fig:unified_unified']} shows the graphical model for iem, which subsumes *cd (\ref{['subsec:exchg_cd']}), *ica (\ref{['subsec:exchg_ica']}), and *crl (\ref{['subsec:exchg_crl']}). $S$ denotes latent, $Z$ causal, and $O$ observed variables with corresponding latent parameters $\boldsymbol{\theta}, \psi$, superscripts denote different samples. Red denotes observed/known quantities, blue stands for target quantities, and gray illustrates components that are not explicitly modeled in a particular paradigm. $\theta_i$ are latent variables controlling separate probabilistic mechanisms, indicated by dotted vertical lines. cd (\ref{['fig:unified_cd']}) corresponds to the left-most layer of iem, focusing on the study of cause-effect relationships between observed causal variables; ica (\ref{['fig:unified_ica']}) infers source variables from observations, but without causal connections in the left-most layer of iem; crl (\ref{['fig:unified_crl']}) shares the most similar structure with iem, as it has both layers, including the intermediate causal representations. See \ref{['fig:unified_large']} for an enlarged view
Figure 2: non--*iid conditions for bivariate cd:(a) Exchangeable non--i.i.d. dgp for both cause $P(\mathbf{X})$ and mechanism $P(\mathbf{Y}|\mathbf{X})$guo2024causal; (b): exchangeable non--i.i.d. dgp for cause $P(\mathbf{X})$ and i.i.d. dgp for mechanism $P(\mathbf{Y}|\mathbf{X})$(c): exchangeable non--i.i.d. dgp for mechanism $P(\mathbf{Y}|\mathbf{X})$ and i.i.d. dgp for cause $P(\mathbf{X})$. \ref{['thm:extendcdf']} shows that identifying the unique bivariate causal structure is possible if either the cause or the mechanism follows an exchangeable non--i.i.d. dgp
Figure 3: The duality of cause and mechanism variability in TCL:\ref{['lem:dual_cause_mech']} shows that the same identifiability result holds in (Left): the original TCL setting with exchangeable non--i.i.d. sources $S$ with deterministic $f$ mixing (cause variability), and the matching (Right):*iid sources $S$ with a stochastic $\hat{f}(u)$ mixing (mechanism variability)
Figure 4: *iem--A unified model for structure and representation identifiability : Here we show that exchangeable but non-i.i.d. data enables identification in key methods across *cd, *ica, and *crl. The graphical model in \ref{['fig:unified_unified']} shows the iem model, which subsumes *cd (\ref{['subsec:exchg_cd']}), *ica (\ref{['subsec:exchg_ica']}), and *crl (\ref{['subsec:exchg_crl']}). $S$ denotes latent, $Z$ causal, and $O$ observed variables with corresponding latent parameters $\boldsymbol{\theta}, \psi$, superscripts denote different samples. Red denotes observed/known quantities, blue stands for target quantities, and gray illustrates components that are not explicitly modeled in a particular paradigm. $\theta_i$ are latent variables controlling separate probabilistic mechanisms, indicated by dotted vertical lines. cd (\ref{['fig:unified_cd']}) corresponds to the left-most layer of iem, focusing on the study of cause-effect relationships between observed causal variables; ica (\ref{['fig:unified_ica']}) infers source variables from observations, but without causal connections in the left-most layer of iem; crl (\ref{['fig:unified_crl']}) shares the most similar structure with iem, as it has both layers, including the intermediate causal representations
Figure 5: We show that the richness argument in cdf guo2024causal can be realized, in the bivariate case, via either only varying the prior of the causes' parameters $\theta$ (\ref{['fig:cause_var']}) or the prior of the mechanism' parameters $\psi$ (\ref{['fig:mech_var']}). That is, it is not necessary to have rich priors for both $\theta, \psi$
...and 1 more figures

Theorems & Definitions (37)

Definition 1: Exchangeable sequence
Theorem 1: Causal de Finetti guo2024causal
Lemma 1
Lemma 2
Theorem 2
Lemma 3
Example 1: Duality of cause and mechanism variability for Gaussian models
Lemma 4
Lemma 5
Lemma 6
...and 27 more

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

TL;DR

Abstract

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (37)