Table of Contents
Fetching ...

Scalable Variational Causal Discovery Unconstrained by Acyclicity

Nu Hoang, Bao Duong, Thin Nguyen

TL;DR

This work tackles scalable Bayesian causal discovery without enforcing explicit acyclicity during learning. It introduces VCUDA, a differentiable DAG sampling framework that extends the No-Curl mapping to binary adjacency by using a tempered sigmoid, enabling fast generation of DAGs from unconstrained distributions and integration with variational inference via an ELBO objective. Empirical results on synthetic linear and nonlinear Gaussian SEMs, high-dimensional graphs, and a real protein signaling dataset show that VCUDA achieves strong AUC-ROC/AUC-PR performance while significantly reducing sampling time and improving scalability compared to state-of-the-art baselines. The approach provides a practical, uncertainty-aware toolkit for causal structure learning with potential impact across domains requiring scalable causal discovery from observational data.

Abstract

Bayesian causal discovery offers the power to quantify epistemic uncertainties among a broad range of structurally diverse causal theories potentially explaining the data, represented in forms of directed acyclic graphs (DAGs). However, existing methods struggle with efficient DAG sampling due to the complex acyclicity constraint. In this study, we propose a scalable Bayesian approach to effectively learn the posterior distribution over causal graphs given observational data thanks to the ability to generate DAGs without explicitly enforcing acyclicity. Specifically, we introduce a novel differentiable DAG sampling method that can generate a valid acyclic causal graph by mapping an unconstrained distribution of implicit topological orders to a distribution over DAGs. Given this efficient DAG sampling scheme, we are able to model the posterior distribution over causal graphs using a simple variational distribution over a continuous domain, which can be learned via the variational inference framework. Extensive empirical experiments on both simulated and real datasets demonstrate the superior performance of the proposed model compared to several state-of-the-art baselines.

Scalable Variational Causal Discovery Unconstrained by Acyclicity

TL;DR

This work tackles scalable Bayesian causal discovery without enforcing explicit acyclicity during learning. It introduces VCUDA, a differentiable DAG sampling framework that extends the No-Curl mapping to binary adjacency by using a tempered sigmoid, enabling fast generation of DAGs from unconstrained distributions and integration with variational inference via an ELBO objective. Empirical results on synthetic linear and nonlinear Gaussian SEMs, high-dimensional graphs, and a real protein signaling dataset show that VCUDA achieves strong AUC-ROC/AUC-PR performance while significantly reducing sampling time and improving scalability compared to state-of-the-art baselines. The approach provides a practical, uncertainty-aware toolkit for causal structure learning with potential impact across domains requiring scalable causal discovery from observational data.

Abstract

Bayesian causal discovery offers the power to quantify epistemic uncertainties among a broad range of structurally diverse causal theories potentially explaining the data, represented in forms of directed acyclic graphs (DAGs). However, existing methods struggle with efficient DAG sampling due to the complex acyclicity constraint. In this study, we propose a scalable Bayesian approach to effectively learn the posterior distribution over causal graphs given observational data thanks to the ability to generate DAGs without explicitly enforcing acyclicity. Specifically, we introduce a novel differentiable DAG sampling method that can generate a valid acyclic causal graph by mapping an unconstrained distribution of implicit topological orders to a distribution over DAGs. Given this efficient DAG sampling scheme, we are able to model the posterior distribution over causal graphs using a simple variational distribution over a continuous domain, which can be learned via the variational inference framework. Extensive empirical experiments on both simulated and real datasets demonstrate the superior performance of the proposed model compared to several state-of-the-art baselines.
Paper Structure (14 sections, 2 theorems, 19 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 2 theorems, 19 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\mathbb{\mathbf{A}}\in\{0,1\}^{d\times d}$ be an adjacency matrix of a graph of $d$ nodes. Then, $\mathbf{A}$ is DAG if and only if there exists a vector of priority scores $\mathbf{p}\in\mathbb{R}^{d}$ and a corresponding binary matrix $\mathbf{W}\in\{0,1\}^{d\times d}$ such that: where $t>0$ is a strictly positive temperature and $\mathbf{p}$ contains no duplicate elements, i.e. , $p_{i}\n

Figures (6)

  • Figure 1: Performance on synthetic data generated from linear Gaussian models with $d=10$ and $d=50$ variables of different graph models. The reported values are aggregated from 10 independent runs. VCUDA achieves the best results across most metrics and outperforms other Bayesian approaches (DiBS and DDS). $\downarrow$ denotes lower is better and $\uparrow$ denotes higher is better.
  • Figure 2: Performance on synthetic data generated from nonlinear Gaussian models with $d=10$ and $d=50$ variables of different graph models. The reported values are aggregated from 10 independent runs. Our proposed approach VCUDA achieves the best results across most metrics and outperforms other Bayesian based approaches (DiBS and DDS). $\downarrow$ denotes lower is better and $\uparrow$ denotes higher is better.
  • Figure 3: Performance on high dimensional data with $d=100$ for different graphs and causal functional models. The reported values are aggregated from 10 independent runs. Our proposed approach VCUDA achieves the best results across most metrics. $\downarrow$ denotes lower is better and $\uparrow$ denotes higher is better.
  • Figure 4: The running time for causal discovery on synthetic datasets generated from a nonlinear model and ER graphs with ${d=[10,30,50]}$. VCUDA runs faster than 3 of 4 baselines, especially in high dimensions.
  • Figure 5: The performance of VCUDA on different values of temperature $t$. The numerical results are obtained from 10 random datasets generated from a linear SEM with both ER and SF graph models.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof