$χ$SPN: Characteristic Interventional Sum-Product Networks for Causal Inference in Hybrid Domains

Harsh Poonia; Moritz Willig; Zhongjie Yu; Matej Zečević; Kristian Kersting; Devendra Singh Dhami

$χ$SPN: Characteristic Interventional Sum-Product Networks for Causal Inference in Hybrid Domains

Harsh Poonia, Moritz Willig, Zhongjie Yu, Matej Zečević, Kristian Kersting, Devendra Singh Dhami

TL;DR

This work addresses causal inference in hybrid domains with mixed discrete and continuous variables by proposing χSPN, a Characteristic Interventional Sum-Product Network. χSPN embeds leaves with univariate characteristic functions and learns the root distribution's characteristic function conditioned on interventions via a neural network, enabling tractable inference of interventional distributions even when closed-form densities are unavailable. The approach leverages the Empirical Characteristic Function for training and uses CFD as the training objective to match interventional distributions, with inversion techniques to recover joint densities. The paper demonstrates that χSPN can generalize to multiple interventions from training on a single intervention and shows promising results on three synthetic heterogeneous datasets, highlighting its potential for causal reasoning in realistic mixed-data settings.

Abstract

Causal inference in hybrid domains, characterized by a mixture of discrete and continuous variables, presents a formidable challenge. We take a step towards this direction and propose Characteristic Interventional Sum-Product Network ($χ$SPN) that is capable of estimating interventional distributions in presence of random variables drawn from mixed distributions. $χ$SPN uses characteristic functions in the leaves of an interventional SPN (iSPN) thereby providing a unified view for discrete and continuous random variables through the Fourier-Stieltjes transform of the probability measures. A neural network is used to estimate the parameters of the learned iSPN using the intervened data. Our experiments on 3 synthetic heterogeneous datasets suggest that $χ$SPN can effectively capture the interventional distributions for both discrete and continuous variables while being expressive and causally adequate. We also show that $χ$SPN generalize to multiple interventions while being trained only on a single intervention data.

$χ$SPN: Characteristic Interventional Sum-Product Networks for Causal Inference in Hybrid Domains

TL;DR

Abstract

SPN) that is capable of estimating interventional distributions in presence of random variables drawn from mixed distributions.

SPN uses characteristic functions in the leaves of an interventional SPN (iSPN) thereby providing a unified view for discrete and continuous random variables through the Fourier-Stieltjes transform of the probability measures. A neural network is used to estimate the parameters of the learned iSPN using the intervened data. Our experiments on 3 synthetic heterogeneous datasets suggest that

SPN can effectively capture the interventional distributions for both discrete and continuous variables while being expressive and causally adequate. We also show that

SPN generalize to multiple interventions while being trained only on a single intervention data.

Paper Structure (24 sections, 3 theorems, 19 equations, 11 figures, 3 tables)

This paper contains 24 sections, 3 theorems, 19 equations, 11 figures, 3 tables.

Introduction
Preliminaries & Related Work
Sum-Product Networks
Causal Models
Characteristic Functions
$\chi$SPN
$\chi$SPN Structure
Product Nodes.
Sum Nodes.
Leaf Nodes.
Expressivity
Learning
Evaluation Metric.
Tractability of Inference
$\chi$SPN is a Universal Function Approximator
...and 9 more sections

Key Result

Theorem 2.2

Let $X$ be a real-valued random variable, $\mu_X$ its probability measure, and $\varphi_X: \mathbb{R} \rightarrow \mathbb{C}$ its characteristic function. Then for any $a, b \in$$\mathbb{R}, a<b$, we have that and, hence, $\varphi_X$ uniquely determines $\mu_X$.

Figures (11)

Figure 1: Correct Mixing of Distributions via $\chi$SPN. Classical mixedSPN naïvely multiply discrete probabilities and continuous densities, leading to an ill-defined probability measure. For practical applications, large density values can possibly outweigh normalized discrete probabilities, biasing parameter estimation. $\chi$SPN overcome this problem by transforming discrete and continuous variables into a shared spectral domain. Sum and product operations on the spectral representations are well defined. (Best viewed in color.)
Figure 2: $\chi$SPN parameters are provided by intervention information (Left).$\chi$SPN accounts for interventions that change the graph structure and --in consequence-- the intervened probability distribution. The parameterization of the SPN leaves and weights ($\theta$) is predicted by a neural network conditioned on intervention information. Training Setup (Right). Parameters $\theta$ of the $\chi$SPN are trained by matching the predicted $\chi$ distribution at the root node against the $\chi$ distribution computed from interventional data. (Best viewed in color.)
Figure 3: Evaluated Mixed Datasets. All $\chi$SPN are trained and evaluated on three mixed type data sets. Hiring and Student data sets contain a mix of continuous (indicated via black circles) and discrete (indicated via green squares) within an exemplary causal process. Causal Health Classification features the important special case of a categorization process resulting in three discrete diagnosis variables which are derived from all-continuous observations. (Best viewed in color.)
Figure 4: Approximation of Interventional Densities. Plots feature the approximated densities of continuous variables for different interventional distributions. Marginalized ground truth distributions (plotted as bar diagrams) and $\chi$SPN approximations (red line) are shown. Modes of the distributions are generally well matched across most plots. Deviations from ground truth show at distribution boundaries as artifacts of the $\chi$ function discretization. (Best viewed in color.)
Figure 5: Accuracies of Discrete Variable Prediction. Tables contain the prediction accuracies over all discrete variables of the data sets. Results for observational and interventions on the remaining (unintervened) continuous variables are presented.
...and 6 more figures

Theorems & Definitions (5)

Definition 2.1: SCM
Theorem 2.2: Lévy's inversion theorem Sasvári+2013
Corollary 2.3
Definition 3.1: $\chi$ Sum-Product Network
Lemma 3.2: Inversion

$χ$SPN: Characteristic Interventional Sum-Product Networks for Causal Inference in Hybrid Domains

TL;DR

Abstract

$χ$SPN: Characteristic Interventional Sum-Product Networks for Causal Inference in Hybrid Domains

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (5)