Table of Contents
Fetching ...

The relative value of interventional and observational samples in Bayesian Causal Linear Gaussian Models

Valentinian Lungu, Anish Dhir, Mark van der Wilk, Ioannis Kontoyiannis

Abstract

We investigate the asymptotic properties of Bayesian bivariate causal discovery for Gaussian Linear Structural Equation Models (SEMs) with heteroscedastic noise. We demonstrate that with purely observational data, the posterior distribution over the models fails to consistently identify the true causal structure - a consequence of the fundamental non-identifiability within the Markov Equivalence Class. Specifically, if the true generating mechanism corresponds to a connected graph (A -> B or B -> A), the asymptotic behavior of the posterior is given by the ratio between the prior on the true model and the push-forward prior of the alternative. In contrast, for the independence model, we establish that the posterior concentrates at a stochastic polynomial rate of O_p(n^{-1/2}). To resolve this non-identifiability, we incorporate m interventional samples and characterize the concentration rates as a function of the observational-to-total sample ratio, η. We identify a sharp concentration dichotomy: while the independence graph maintains a polynomial O_p(N^{-1/2}) rate (where N = n+m), connected graphs undergo a phase transition to exponentially fast convergence. This highlights an exponential relative importance between the two data types, as altering the amount of one data type directly changes the exponent governing the concentration speed. We derive explicit formulae for the exponential decay rates and provide precise conditions under which mixing observational and interventional data optimizes concentration speed. Finally, our theoretical findings are validated through empirical simulations in Bayesian Gaussian equivalent (BGe)-style prior specifications offering a principled foundation for experimental design in Bayesian causal discovery.

The relative value of interventional and observational samples in Bayesian Causal Linear Gaussian Models

Abstract

We investigate the asymptotic properties of Bayesian bivariate causal discovery for Gaussian Linear Structural Equation Models (SEMs) with heteroscedastic noise. We demonstrate that with purely observational data, the posterior distribution over the models fails to consistently identify the true causal structure - a consequence of the fundamental non-identifiability within the Markov Equivalence Class. Specifically, if the true generating mechanism corresponds to a connected graph (A -> B or B -> A), the asymptotic behavior of the posterior is given by the ratio between the prior on the true model and the push-forward prior of the alternative. In contrast, for the independence model, we establish that the posterior concentrates at a stochastic polynomial rate of O_p(n^{-1/2}). To resolve this non-identifiability, we incorporate m interventional samples and characterize the concentration rates as a function of the observational-to-total sample ratio, η. We identify a sharp concentration dichotomy: while the independence graph maintains a polynomial O_p(N^{-1/2}) rate (where N = n+m), connected graphs undergo a phase transition to exponentially fast convergence. This highlights an exponential relative importance between the two data types, as altering the amount of one data type directly changes the exponent governing the concentration speed. We derive explicit formulae for the exponential decay rates and provide precise conditions under which mixing observational and interventional data optimizes concentration speed. Finally, our theoretical findings are validated through empirical simulations in Bayesian Gaussian equivalent (BGe)-style prior specifications offering a principled foundation for experimental design in Bayesian causal discovery.

Paper Structure

This paper contains 23 sections, 15 theorems, 110 equations, 6 figures.

Key Result

Lemma 3.1

The two structures, $S^1$ and $S^2$ are non-identifiable, i.e. for any choice of parameters $\theta^1\in \Theta$, there exists $\theta^2\in \Theta$ such that the likelihoods of the two models are the same, i.e. $f(x\mid \theta^1, S^1) = f(x\mid \theta^2, S^2)$ for any $x = [x(1), x(2)] \in \mathbb{R

Figures (6)

  • Figure 2: Concentration speed (in log scale) of the posterior $\pi(S^{3*} \mid \mathcal{D}_{\text{obs}})$ to 1 when the true generating model is $S^3$.
  • Figure 3: Augmented posterior odds showing convergence to a $\chi_1^2$ distribution when the true generating model is $S^{3*}$.
  • Figure 4: Concentration speed of the posterior to 1 when the true model is $S^1$ (left) or $S^2$ (right).
  • Figure 5: Concentration speed (in log scale) of the posterior $\pi(S^{3*} \mid \mathcal{D}_{\text{mix}})$ to 1 when the true generating model is $S^3$.
  • Figure 6: Posterior ratio $\frac{\pi(S^1 \mid \mathcal{D}_{\text{obs}})}{\pi(S^2 \mid \mathcal{D}_{\text{obs}})}$ as a function of sample size $n$ when the true model is $S^1$. The hyper-parameters in the right plot correspond to the BGe prior.
  • ...and 1 more figures

Theorems & Definitions (34)

  • Lemma 3.1
  • Theorem 3.1
  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 3.2
  • Remark 4
  • Remark 5
  • Theorem 4.1
  • Lemma 4.1
  • ...and 24 more