Table of Contents
Fetching ...

Do Finetti: On Causal Effects for Exchangeable Data

Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

TL;DR

The work extends causal effect estimation beyond i.i.d. data to exchangeable data generated by independent causal mechanisms (ICM). It introduces a generalized truncated factorization for ICM-generative processes, formalizes interventions via delta-distributions, and proves identifiability of causal effects from exchangeable data. A causal Pólya urn model illustrates how post-interventional distributions can depend on conditioning on other observations, and the Do-Finetti algorithm enables simultaneous causal graph discovery and effect estimation from multi-environment data. Collectively, these results enable principled causal analysis on realistic non-i.i.d. settings common in multi-environment studies. This framework broadens the applicability of causal inference to complex, structured data encountered in health, biology, and machine learning systems.

Abstract

We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal Pólya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data.

Do Finetti: On Causal Effects for Exchangeable Data

TL;DR

The work extends causal effect estimation beyond i.i.d. data to exchangeable data generated by independent causal mechanisms (ICM). It introduces a generalized truncated factorization for ICM-generative processes, formalizes interventions via delta-distributions, and proves identifiability of causal effects from exchangeable data. A causal Pólya urn model illustrates how post-interventional distributions can depend on conditioning on other observations, and the Do-Finetti algorithm enables simultaneous causal graph discovery and effect estimation from multi-environment data. Collectively, these results enable principled causal analysis on realistic non-i.i.d. settings common in multi-environment studies. This framework broadens the applicability of causal inference to complex, structured data encountered in health, biology, and machine learning systems.

Abstract

We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal Pólya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data.
Paper Structure (27 sections, 12 theorems, 60 equations, 4 figures, 1 algorithm)

This paper contains 27 sections, 12 theorems, 60 equations, 4 figures, 1 algorithm.

Key Result

Corollary 1

Let $P$ be the distribution for some ICM generative process. Let $\mathbf{I}$ and $\mathbf{J}$ be two disjoint subsets in $[d]:=\{1, \ldots, d\}.$ Denote $\mathbf{X}_{\mathbf{I};n} := \{X_{i;n}: i \in \mathbf{I}\}$ and similarly for $\mathbf{X}_{\mathbf{J};n}$. Then, i.e., identical interventions on variables in different positions share the same marginal post-interventional distributions. See Ap

Figures (4)

  • Figure 1: A bivariate illustration demonstrates differences in causal effect between i.i.d. processes and ICM-generative processes. Suppose $\mathcal{G} = X \to Y$. The hammer represents an intervention on the closest node. (a): Data generated according to $\mathcal{G}$ under an i.i.d. process; (b): Data generated under an ICM-generative process (using plate notation). Block A shows how $P(Y|do(X))$ differs between the i.i.d. (a) and the exchangeable (b) case. Note that the causal effect under i.i.d. is a special case of that under exchangeable processes with $p(\psi) = \delta(\psi = \psi_0)$, for some value $\psi_0$. Corollary \ref{['corollary:identical_marginal_postintervention']} below justifies that we omit position indices from Block A in ICM-generative processes. Block B shows the difference in intervention effect when conditioned on other observations, for i.i.d. (a) and ICM-generative (b) processes. In the i.i.d. case (a), due to $(Y_1, X_1) \perp\!\!\!\!\perp (Y_2, X_2)$, conditioning on $(X_2, Y_2)$ conveys no information on the prediction of the interventional effect on $Y_1$. In contrast, for an ICM-generative process (b), observing $(X_2, Y_2)$ does provide additional information about the effect on $Y_1$ when intervening on $X_1$. Graph (c) illustrates the graph surgery performed on $\text{ICM}(\mathcal{G})$ (cf. Def. \ref{['def:icm-operator']} in Appendix \ref{['sec:graphical_terminology']}). We observe that conditioning on a collider $Y_2$ provides additional information about the effect on $Y_1$ when intervening on $X_1$.
  • Figure 2: An example of $\text{ICM}(\mathcal{G})$: Two data tuples are generated by an ICM generative process with respect to $\mathcal{G}:= X_1 \leftarrow X_2 \to X_3$, where $X_{i;n}$ is the $i$-th variable in the $n$-th position and gray means latent variables.
  • Figure 3: An illustration of differences in what the do-operator does between a structural causal model (a) and an ICM generative process (b). In the observational phase, SCMs (a), where the dotted plate indicates i.i.d. sampled, illustrates that fixed assignment of $U_X$ and $U_Y$ leads to fixed observable values $X$ and $Y$; on the other hand, for ICM-generative processes where exch. is an abbreviation for an exchangeable process, fixing $\theta, \psi$ does not fix $X$ and $Y$, instead, it means sampling from a fixed distribution. Because SCM fails to characterize ICM-generative processes, we define the operational meaning of do-interventions on ICM-generative processes as assigning $\delta$-distribution to the intervened variables and substituting the corresponding values in the remaining distributions.
  • Figure 4: Our method's (do-Finetti) performance in simultaneously identifying DAG (right) and causal effect estimation (left), compared to the i.i.d. algorithm (i.i.d.) and corresponding methods with known true DAG (do Finetti with true DAG and i.i.d. with true DAG) in bivariate setting. Left shown are the mean and standard deviation of MSE compared to analytic solutions for each method aggregated over 100 experiments. Right shows the accuracy of identifying the correct underlying DAG for each method. Do-Finetti identifies unique causal structures and achieves near-perfect causal effect estimation.

Theorems & Definitions (27)

  • Definition 1: Exchangeable Sequence
  • Definition 2: ICM generative process
  • Definition 3: Causal Effect in ICM generative processes
  • Corollary 1: Identical marginal post-interventional distributions
  • Theorem 1: Truncated Factorization in ICM generative processes
  • Lemma 1: Intervention effect conditioned on other observations
  • Theorem 2: Causal effect identification in ICM generative processes
  • Definition 4: d-separation
  • Definition 5: I-map
  • Definition 6: Markovian and Faithful
  • ...and 17 more