ProDAG: Projected Variational Inference for Directed Acyclic Graphs

Ryan Thompson; Edwin V. Bonilla; Robert Kohn

ProDAG: Projected Variational Inference for Directed Acyclic Graphs

Ryan Thompson, Edwin V. Bonilla, Robert Kohn

TL;DR

ProDAG tackles DAG structure learning with uncertainty quantification by introducing a Bayesian variational framework whose priors and posteriors are distributions over DAGs obtained via a projection of a continuous matrix onto the space of sparse acyclic weighted adjacency matrices, i.e. $W=\operatorname{pro}_\lambda(\tilde{W})$. The projection-based approach enforces exact acyclicity and sparsity while enabling GPU-accelerated continuous optimization and analytic gradients via the implicit function theorem. The method extends to nonlinear SEMs and demonstrates superior uncertainty quantification and structure recovery on linear and nonlinear synthetic data and real data (e.g., Sachs) relative to state-of-the-art baselines. This work provides a scalable, uncertainty-aware DAG learning framework with open-source tooling and broad applicability to causal discovery tasks.

Abstract

Directed acyclic graph (DAG) learning is a central task in structure discovery and causal inference. Although the field has witnessed remarkable advances over the past few years, it remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification. We address the difficult task of quantifying graph uncertainty by developing a Bayesian variational inference framework based on novel, provably valid distributions that have support directly on the space of sparse DAGs. These distributions, which we use to define our prior and variational posterior, are induced by a projection operation that maps an arbitrary continuous distribution onto the space of sparse weighted acyclic adjacency matrices. While this projection is combinatorial, it can be solved efficiently using recent continuous reformulations of acyclicity constraints. We empirically demonstrate that our method, ProDAG, can outperform state-of-the-art alternatives in both accuracy and uncertainty quantification.

ProDAG: Projected Variational Inference for Directed Acyclic Graphs

TL;DR

. The projection-based approach enforces exact acyclicity and sparsity while enabling GPU-accelerated continuous optimization and analytic gradients via the implicit function theorem. The method extends to nonlinear SEMs and demonstrates superior uncertainty quantification and structure recovery on linear and nonlinear synthetic data and real data (e.g., Sachs) relative to state-of-the-art baselines. This work provides a scalable, uncertainty-aware DAG learning framework with open-source tooling and broad applicability to causal discovery tasks.

Abstract

Paper Structure (41 sections, 3 theorems, 48 equations, 15 figures, 4 tables, 3 algorithms)

This paper contains 41 sections, 3 theorems, 48 equations, 15 figures, 4 tables, 3 algorithms.

Introduction
Related work
Projected distributions
Description
Properties
Scalable projections
Continuous reformulation
Algorithms
Gradients
Posterior learning
Prior and variational posterior
Evidence lower bound
Algorithm
Nonlinear DAGs
Structural equation model
...and 26 more sections

Key Result

Theorem 1

Let $\tilde{W}$ be endowed with a continuous probability measure. Then it holds:

Figures (15)

Figure 1: Illustration of ProDAG's projected distributions. Samples (blue dots) from an unconstrained continuous space (blue ellipse) are projected onto the nearest acyclic matrix within an $\ell_1$-constrained region (orange diamonds). Projected samples (orange dots) satisfy acyclicity and sparsity constraints. Theoretically, we show that for any continuous $\tilde{W}\sim\mathbb{P}$ the projection $\operatorname{pro}_\lambda(\tilde{W})$ is unique and measurable. This result implies that the projected distribution is a valid distribution over DAGs.
Figure 2: Computation times in seconds with a sample size $n=100$. The averages (solid points) and standard errors (error bars) are measured over 10 independently and identically generated datasets.
Figure 3: Performance on synthetic datasets generated from linear Erdős–Rényi DAGs with $p=20$ nodes, $s=40$ edges, and Gaussian noise. The averages (solid points) and standard errors (error bars) are measured over 10 independently and identically generated datasets.
Figure 4: Performance on synthetic datasets generated from linear Erdős–Rényi DAGs with $p=100$ nodes, $s=200$ edges, and Gaussian noise. The averages (solid points) and standard errors (error bars) are measured over 10 independently and identically generated datasets.
Figure 5: Performance on synthetic datasets generated from nonlinear Erdős–Rényi DAGs with $p=10$ nodes, $s=20$ edges, and Gaussian noise. The averages (solid points) and standard errors (error bars) are measured over 10 independently and identically generated datasets.
...and 10 more figures

Theorems & Definitions (6)

Theorem 1
Proposition 1
proof
Proposition 2
proof
proof

ProDAG: Projected Variational Inference for Directed Acyclic Graphs

TL;DR

Abstract

ProDAG: Projected Variational Inference for Directed Acyclic Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (6)