Do Finetti: On Causal Effects for Exchangeable Data

Siyuan Guo; Chi Zhang; Karthika Mohan; Ferenc Huszár; Bernhard Schölkopf

Do Finetti: On Causal Effects for Exchangeable Data

Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

TL;DR

The work extends causal effect estimation beyond i.i.d. data to exchangeable data generated by independent causal mechanisms (ICM). It introduces a generalized truncated factorization for ICM-generative processes, formalizes interventions via delta-distributions, and proves identifiability of causal effects from exchangeable data. A causal Pólya urn model illustrates how post-interventional distributions can depend on conditioning on other observations, and the Do-Finetti algorithm enables simultaneous causal graph discovery and effect estimation from multi-environment data. Collectively, these results enable principled causal analysis on realistic non-i.i.d. settings common in multi-environment studies. This framework broadens the applicability of causal inference to complex, structured data encountered in health, biology, and machine learning systems.

Abstract

We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal Pólya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data.

Do Finetti: On Causal Effects for Exchangeable Data

TL;DR

Abstract

Paper Structure (27 sections, 12 theorems, 60 equations, 4 figures, 1 algorithm)

This paper contains 27 sections, 12 theorems, 60 equations, 4 figures, 1 algorithm.

Introduction
Preliminaries
The Causal Framework in i.i.d. Data
The Causal Framework in Exchangeable Data
Causal Effect in Exchangeable Data
Causal Effect Identifiability in ICM generative processes
Conditional Interventional distributions
Rules of compact representation of causal effect
Causal Effect in Multi-environment data
Experiments
Discussion
Graphical Terminology
Proof of Corollary \ref{['corollary:identical_marginal_postintervention']}
Proof of Theorem \ref{['theorem:truncated_factorization']}
Proof of Lemma \ref{['lemma:conditional_post_interventional_distribution']}
...and 12 more sections

Key Result

Corollary 1

Let $P$ be the distribution for some ICM generative process. Let $\mathbf{I}$ and $\mathbf{J}$ be two disjoint subsets in $[d]:=\{1, \ldots, d\}.$ Denote $\mathbf{X}_{\mathbf{I};n} := \{X_{i;n}: i \in \mathbf{I}\}$ and similarly for $\mathbf{X}_{\mathbf{J};n}$. Then, i.e., identical interventions on variables in different positions share the same marginal post-interventional distributions. See Ap

Figures (4)

Figure 1: A bivariate illustration demonstrates differences in causal effect between i.i.d. processes and ICM-generative processes. Suppose $\mathcal{G} = X \to Y$. The hammer represents an intervention on the closest node. (a): Data generated according to $\mathcal{G}$ under an i.i.d. process; (b): Data generated under an ICM-generative process (using plate notation). Block A shows how $P(Y|do(X))$ differs between the i.i.d. (a) and the exchangeable (b) case. Note that the causal effect under i.i.d. is a special case of that under exchangeable processes with $p(\psi) = \delta(\psi = \psi_0)$, for some value $\psi_0$. Corollary \ref{['corollary:identical_marginal_postintervention']} below justifies that we omit position indices from Block A in ICM-generative processes. Block B shows the difference in intervention effect when conditioned on other observations, for i.i.d. (a) and ICM-generative (b) processes. In the i.i.d. case (a), due to $(Y_1, X_1) \perp\!\!\!\!\perp (Y_2, X_2)$, conditioning on $(X_2, Y_2)$ conveys no information on the prediction of the interventional effect on $Y_1$. In contrast, for an ICM-generative process (b), observing $(X_2, Y_2)$ does provide additional information about the effect on $Y_1$ when intervening on $X_1$. Graph (c) illustrates the graph surgery performed on $\text{ICM}(\mathcal{G})$ (cf. Def. \ref{['def:icm-operator']} in Appendix \ref{['sec:graphical_terminology']}). We observe that conditioning on a collider $Y_2$ provides additional information about the effect on $Y_1$ when intervening on $X_1$.
Figure 2: An example of $\text{ICM}(\mathcal{G})$: Two data tuples are generated by an ICM generative process with respect to $\mathcal{G}:= X_1 \leftarrow X_2 \to X_3$, where $X_{i;n}$ is the $i$-th variable in the $n$-th position and gray means latent variables.
Figure 3: An illustration of differences in what the do-operator does between a structural causal model (a) and an ICM generative process (b). In the observational phase, SCMs (a), where the dotted plate indicates i.i.d. sampled, illustrates that fixed assignment of $U_X$ and $U_Y$ leads to fixed observable values $X$ and $Y$; on the other hand, for ICM-generative processes where exch. is an abbreviation for an exchangeable process, fixing $\theta, \psi$ does not fix $X$ and $Y$, instead, it means sampling from a fixed distribution. Because SCM fails to characterize ICM-generative processes, we define the operational meaning of do-interventions on ICM-generative processes as assigning $\delta$-distribution to the intervened variables and substituting the corresponding values in the remaining distributions.
Figure 4: Our method's (do-Finetti) performance in simultaneously identifying DAG (right) and causal effect estimation (left), compared to the i.i.d. algorithm (i.i.d.) and corresponding methods with known true DAG (do Finetti with true DAG and i.i.d. with true DAG) in bivariate setting. Left shown are the mean and standard deviation of MSE compared to analytic solutions for each method aggregated over 100 experiments. Right shows the accuracy of identifying the correct underlying DAG for each method. Do-Finetti identifies unique causal structures and achieves near-perfect causal effect estimation.

Theorems & Definitions (27)

Definition 1: Exchangeable Sequence
Definition 2: ICM generative process
Definition 3: Causal Effect in ICM generative processes
Corollary 1: Identical marginal post-interventional distributions
Theorem 1: Truncated Factorization in ICM generative processes
Lemma 1: Intervention effect conditioned on other observations
Theorem 2: Causal effect identification in ICM generative processes
Definition 4: d-separation
Definition 5: I-map
Definition 6: Markovian and Faithful
...and 17 more

Do Finetti: On Causal Effects for Exchangeable Data

TL;DR

Abstract

Do Finetti: On Causal Effects for Exchangeable Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (27)