Table of Contents
Fetching ...

Bivariate Causal Discovery using Bayesian Model Selection

Anish Dhir, Samuel Power, Mark van der Wilk

TL;DR

This work reframes bivariate causal discovery as Bayesian model selection, relaxing strict identifiability constraints by embedding causal assumptions into priors and treating direction as competing BCMs. By using a Gaussian process latent variable model (GPLVM) to flexibly model the joint distribution, the approach can distinguish X → Y from X ← Y even when distribution-equivalent under likelihoods holds, leveraging the Independent Causal Mechanisms (ICM) principle and separable priors. The authors derive conditions under which marginal likelihoods discriminate causal directions, provide a statistical test for the asymmetry, and analyze performance under model misspecification. Empirically, the GPLVM-based method outperforms restricted identifiability methods and other flexible baselines across real and synthetic datasets, highlighting the practical value of Bayesian model selection with expressive priors for causal discovery. The work also discusses robustness to misspecification and outlines extensions to deeper models, underscoring the method’s potential for real-world causal inference tasks.

Abstract

Much of the causal discovery literature prioritises guaranteeing the identifiability of causal direction in statistical models. For structures within a Markov equivalence class, this requires strong assumptions which may not hold in real-world datasets, ultimately limiting the usability of these methods. Building on previous attempts, we show how to incorporate causal assumptions within the Bayesian framework. Identifying causal direction then becomes a Bayesian model selection problem. This enables us to construct models with realistic assumptions, and consequently allows for the differentiation between Markov equivalent causal structures. We analyse why Bayesian model selection works in situations where methods based on maximum likelihood fail. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint distribution. We then outperform previous methods on a wide range of benchmark datasets with varying data generating assumptions.

Bivariate Causal Discovery using Bayesian Model Selection

TL;DR

This work reframes bivariate causal discovery as Bayesian model selection, relaxing strict identifiability constraints by embedding causal assumptions into priors and treating direction as competing BCMs. By using a Gaussian process latent variable model (GPLVM) to flexibly model the joint distribution, the approach can distinguish X → Y from X ← Y even when distribution-equivalent under likelihoods holds, leveraging the Independent Causal Mechanisms (ICM) principle and separable priors. The authors derive conditions under which marginal likelihoods discriminate causal directions, provide a statistical test for the asymmetry, and analyze performance under model misspecification. Empirically, the GPLVM-based method outperforms restricted identifiability methods and other flexible baselines across real and synthetic datasets, highlighting the practical value of Bayesian model selection with expressive priors for causal discovery. The work also discusses robustness to misspecification and outlines extensions to deeper models, underscoring the method’s potential for real-world causal inference tasks.

Abstract

Much of the causal discovery literature prioritises guaranteeing the identifiability of causal direction in statistical models. For structures within a Markov equivalence class, this requires strong assumptions which may not hold in real-world datasets, ultimately limiting the usability of these methods. Building on previous attempts, we show how to incorporate causal assumptions within the Bayesian framework. Identifying causal direction then becomes a Bayesian model selection problem. This enables us to construct models with realistic assumptions, and consequently allows for the differentiation between Markov equivalent causal structures. We analyse why Bayesian model selection works in situations where methods based on maximum likelihood fail. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint distribution. We then outperform previous methods on a wide range of benchmark datasets with varying data generating assumptions.
Paper Structure (49 sections, 6 theorems, 69 equations, 6 figures, 3 tables)

This paper contains 49 sections, 6 theorems, 69 equations, 6 figures, 3 tables.

Key Result

Proposition 4.4

Given two BCMs $(\mathcal{M}_{\textbf{\color{causalcolour}X} \!\rightarrow\! Y}, \pi_{\textbf{\color{causalcolour}X} \!\rightarrow\! Y})$, $(\mathcal{M}_{X \!\leftarrow\! \textbf{\color{causalcolour}Y}}, \pi_{X\!\leftarrow\!\textbf{\color{causalcolour}Y}})$, suppose that there exists a subset $\math

Figures (6)

  • Figure 1: Toy figure with datasets on the $x$ axis and values of densities on the $y$ axis (a) With a sufficiently flexible model, maximising the likelihood for each dataset will give the same value for both causal models in \ref{['sec:freq_causal_model']}. (b) This has been solved by making restrictions on the datasets they can model. (c) Bayesian model selection retains the ability to identify causal direction, while allowing flexibility. This may lead to some probability of error (overlap).
  • Figure 2: Graphical models for parametrised Bayesian causal models $\mathcal{M}_{\textbf{\color{causalcolour}X} \!\rightarrow\! Y}$ and $\mathcal{M}_{X \!\leftarrow\! \textbf{\color{causalcolour}Y}}$. The causal direction indicates the factorisation that encodes ICM.
  • Figure 3: Samples of datasets from our chosen GPLVM model. ALL figures have the variable $X$ on the x-axis and the variable $Y$ on the y-axis. (a) Shows 6 datasets sampled from GPLVM with $\mathcal{M}_{\textbf{\color{causalcolour}X} \!\rightarrow\! Y}$. (b) Shows 6 datasets sampled from GPLVM with $\mathcal{M}_{X \!\leftarrow\! \textbf{\color{causalcolour}Y}}$. The figures show that the data distribution varies between the two Bayesian causal models.
  • Figure 4: Graphical models for: (a) The linear Gaussian causal model $\mathcal{M}_{\textbf{\color{causalcolour}X} \!\rightarrow\! Y}$ in \ref{['eq:app:linear_gauss_causalfact']}. (b) The anti-causal factorisation of $\mathcal{M}_{\textbf{\color{causalcolour}X} \!\rightarrow\! Y}$ in \ref{['eq:app:linear_gauss_anticausalfact']}. (c) The causal model for $\mathcal{M}_{X \!\leftarrow\! \textbf{\color{causalcolour}Y}}$, where ICM holds in the factorisation $P(Y)P(X|Y)$.
  • Figure 5: Shows samples of joint distributions from the same priors on parameters of the cause and effect, but with different causal models. Red shows the joint of the causal model $\textbf{\color{causalcolour}X} \!\rightarrow\! Y$ and blue shows the joint of the causal model $X\!\leftarrow\!\textbf{\color{causalcolour}Y}$. The contours are plotted for the same draw of parameters, showing that the different causal models will explain different joint distributions well.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Definition 2.1
  • Definition 2.2
  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Proposition 4.4
  • Definition 4.5
  • Proposition 4.6
  • Corollary 4.7
  • Definition 1.1
  • ...and 10 more