Table of Contents
Fetching ...

On the Role of Priors in Bayesian Causal Learning

Bernhard C. Geiger, Roman Kern

TL;DR

The paper analyzes how priors shape Bayesian causal learning of independent causal mechanisms, showing that unlabeled cause realizations do not directly improve learning of the mechanism parameter while a factorized prior on $(\theta,\psi)$ yields a factorized posterior. It proves that cause data influence only the cause parameter in the posterior and that any effect on the mechanism parameter must arise via the prior coupling between parameters. Empirical results with Gaussian models reveal that correlated priors can slow learning across unsupervised, fully supervised, and semi-supervised settings, while factorized priors readily support posterior separation and reduce unintended information transfer from causes to mechanisms. The work offers principled guidance for prior design in Bayesian causal learning and informs semi-supervised and Bayesian deep learning approaches that incorporate causal modules.

Abstract

In this work, we investigate causal learning of independent causal mechanisms from a Bayesian perspective. Confirming previous claims from the literature, we show in a didactically accessible manner that unlabeled data (i.e., cause realizations) do not improve the estimation of the parameters defining the mechanism. Furthermore, we observe the importance of choosing an appropriate prior for the cause and mechanism parameters, respectively. Specifically, we show that a factorized prior results in a factorized posterior, which resonates with Janzing and Schölkopf's definition of independent causal mechanisms via the Kolmogorov complexity of the involved distributions and with the concept of parameter independence of Heckerman et al.

On the Role of Priors in Bayesian Causal Learning

TL;DR

The paper analyzes how priors shape Bayesian causal learning of independent causal mechanisms, showing that unlabeled cause realizations do not directly improve learning of the mechanism parameter while a factorized prior on yields a factorized posterior. It proves that cause data influence only the cause parameter in the posterior and that any effect on the mechanism parameter must arise via the prior coupling between parameters. Empirical results with Gaussian models reveal that correlated priors can slow learning across unsupervised, fully supervised, and semi-supervised settings, while factorized priors readily support posterior separation and reduce unintended information transfer from causes to mechanisms. The work offers principled guidance for prior design in Bayesian causal learning and informs semi-supervised and Bayesian deep learning approaches that incorporate causal modules.

Abstract

In this work, we investigate causal learning of independent causal mechanisms from a Bayesian perspective. Confirming previous claims from the literature, we show in a didactically accessible manner that unlabeled data (i.e., cause realizations) do not improve the estimation of the parameters defining the mechanism. Furthermore, we observe the importance of choosing an appropriate prior for the cause and mechanism parameters, respectively. Specifically, we show that a factorized prior results in a factorized posterior, which resonates with Janzing and Schölkopf's definition of independent causal mechanisms via the Kolmogorov complexity of the involved distributions and with the concept of parameter independence of Heckerman et al.

Paper Structure

This paper contains 10 sections, 18 equations, 3 figures.

Figures (3)

  • Figure 1: Unsupervised causal learning with infinitely many cause realizations ($N=0$ and $M\to\infty$). (Left) The level sets of the prior $p(\theta,\psi)$ are illustrated as a contour plot for $\rho=0.75$. (Right) The prior and posterior distributions of the mechanism parameter $\psi$. Note that the posterior distribution is obtained by evaluating the joint prior at the learned value $\theta=1$.
  • Figure 2: Supervised causal learning ($M=0$) with randomly chosen cause and effect parameters. (Top) We display the log-likelihood $\log p(\psi^\bullet|\mathcal{D}'_N)$ of the true mechanism parameter as a function of the dataset size $N$, averaged over 10,000 random experiments. The log-likelihood increases with $N$, but slower if the correlation coefficient $\rho$ in the prior is larger. (Bottom) Average trajectories of the posterior means $[\theta_N,\psi_N]$ as a function of $N$. As it can be seen, for a strongly correlated prior, the posterior means take a longer route to reach the true parameters $[\theta^\bullet,\psi^\bullet]=[1,-3]$.
  • Figure 3: Semi-supervised causal learning with randomly chosen cause and mechanism parameters. We display the log-likelihood $\log p(\psi^\bullet|\mathcal{D}'_N,\mathcal{D}'_{x,M})$ of the true mechanism parameter as a function of the supervised dataset size $N$ and for different fractions of unsupervised dataset sizes $M$, averaged over 10000 random experiments. Providing additional cause realizations slows down causal learning if the prior is correlated.