Estimating Joint interventional distributions from marginal interventional data

Sergio Hernan Garrido Mejia; Elke Kirschbaum; Armin Kekić; Atalanti Mastakouri

Estimating Joint interventional distributions from marginal interventional data

Sergio Hernan Garrido Mejia, Elke Kirschbaum, Armin Kekić, Atalanti Mastakouri

TL;DR

The paper addresses learning the joint interventional distribution from marginal interventional data by extending the Causal Maximum Entropy framework to i-CMAXENT. It proves that the resulting model remains in the exponential family and provides identifiability results for causal feature selection and for inferring joint interventional effects from single-variable interventions. Key contributions include a principled method to merge marginal data for joint interventional inference, identifiability conditions for adjusting sets, and empirical evidence that i-CMAXENT outperforms CMAXENT and competes with KCI when joint observations are unavailable. This work enables data fusion across disjoint experiments and offers a foundational tool for identifying joint causal effects from marginal information. The approach broadens causal discovery and effect estimation under the causal marginal problem by leveraging interventional data without requiring full joint observations.

Abstract

In this paper we show how to exploit interventional data to acquire the joint conditional distribution of all the variables using the Maximum Entropy principle. To this end, we extend the Causal Maximum Entropy method to make use of interventional data in addition to observational data. Using Lagrange duality, we prove that the solution to the Causal Maximum Entropy problem with interventional constraints lies in the exponential family, as in the Maximum Entropy solution. Our method allows us to perform two tasks of interest when marginal interventional distributions are provided for any subset of the variables. First, we show how to perform causal feature selection from a mixture of observational and single-variable interventional data, and, second, how to infer joint interventional distributions. For the former task, we show on synthetically generated data, that our proposed method outperforms the state-of-the-art method on merging datasets, and yields comparable results to the KCI-test which requires access to joint observations of all variables.

Estimating Joint interventional distributions from marginal interventional data

TL;DR

Abstract

Paper Structure (17 sections, 4 theorems, 16 equations, 2 figures)

This paper contains 17 sections, 4 theorems, 16 equations, 2 figures.

Introduction
Related work
Notation
Background: Maximum Conditional Entropy and CMAXENT
Interventional CMAXENT (i/̄CMAXENT )
i/̄CMAXENT for causal feature selection
Experiments
Causal feature selection
Joint interventional distributions from single interventions
Results
Causal feature selection
Joint interventional distributions from single interventions
Discussion
Proofs of the main results
Computation of Maxent through norm minimisation
...and 2 more sections

Key Result

Theorem 5.1

Using the Lagrange multiplier formalism, the solution of eq:cmaxentopt with the additional constraint from eq:interventionalMAXENT is given by the following exponential family: where $\beta(\mathbf{x})$ is the normalising constant, as in the conditional case. The normalising constant can be computed by enumeration:

Figures (2)

Figure 1: Results for causal feature selection. (\ref{['fig:graph_exp_a']}), (\ref{['fig:graph_exp_b']}), and (\ref{['fig:graph_exp_c']}) show the graph structures used for our synthetic experiments. We randomise the presence of the edges in the lower part of the graphs (dashed arrows). The solid arrows are always present in the shown way. (\ref{['fig:roc_overlay_a']}), (\ref{['fig:roc_overlay_b']}), and (\ref{['fig:roc_overlay_c']}) show ROC curves for the identification of causal edges between the $X_i$'s and $Y$ in setting 1. For i/̄CMAXENT we use constraints on all five single-interventional distributions $P(Y \mid \mathrm{do}(X_i))$. For CMAXENT we use constraints on the five single-conditional distributions $P(Y\mid X_i$. The KCI test has access to observaitons from the joint distribution $P(\mathbf{X}, Y)$. For i/̄CMAXENT and CMAXENT we further consider two cases: First, where the joint observational distribution of the causes $P(\mathbf{X})$ is known (blue and orange line). Second, where $P(\mathbf{X})$ is estimated (green and red lines) from constraints on the five marginal distributions $P(X_i)$. Although our approach only uses single-variable interventional constraints as input, it achieves similar performance as the KCI-test that uses the full generated dataset. (\ref{['fig:combined_roc_a']}), (\ref{['fig:combined_roc_b']}), and (\ref{['fig:combined_roc_c']}) show the performance of i/̄CMAXENT when using a combination of observational and interventional constraints. The number in the legend represents the number of potential causes for which we use interventional constraints ($P(Y \mid \mathrm{do}(X_i))$). For the remaining potential causes we use observational constraints ($P(Y \mid X_i)$).
Figure 2: Residuals between true and estimated joint interventional distributions. The violin plots show the residuals between the true and the estimated joint interventional distributions $P(Y \mid \mathrm{do}(X_1, X_2))$ for five cases that differ in the constraints we use in the estimation. The used constraints are: (1) the joint conditional $P(Y\mid X_i, X_j)$ for a randomly chosen pair $X_i,X_j$, and single-variable interventionals $P(Y\mid \mathrm{do}(X_k))$ for the rest of the variables (blue); (2) only $P(Y\mid X_i, X_j)$ (orange); (3) single-variable interventionals $P(Y\mid \mathrm{do}(X_k))$ for all causes (green); (4) single-variable conditionals $P(Y\mid X_k)$ for all causes (red); and (5) no constraints at all (purple).

Theorems & Definitions (6)

Theorem 5.1: Exponential family of i-CMAXENT
Proposition 5.1: Identifiability and adjustment set of variables with only incoming arrows
Theorem 1.1: Exponential family of i-CMAXENT
proof
Proposition 1.0: Identifiability and adjustment set of variables with only incoming arrows
proof

Estimating Joint interventional distributions from marginal interventional data

TL;DR

Abstract

Estimating Joint interventional distributions from marginal interventional data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)