Table of Contents
Fetching ...

How to Marginalize in Causal Structure Learning?

William Zhao, Guy Van den Broeck, Benjie Wang

TL;DR

This paper addresses the computational bottleneck in Bayesian structure learning caused by marginalization over an enormous graph space. It introduces probabilistic circuits (PCs) as tractable surrogates that can be trained to approximate Bayesian scores and enable fast, exact marginalization for marginal/zero queries, thereby removing restrictive limits on the number of candidate parents. The authors propose a two-phase PC learning procedure and validate their method within the TRUST framework, demonstrating performance comparable to exact DP under full candidate sets and improvements when candidate sets are restricted. The approach offers scalable marginalization for structure learning and has potential to extend to denser graphs and other BN mechanisms, improving posterior quality and inference efficiency in graphical models.

Abstract

Bayesian networks (BNs) are a widely used class of probabilistic graphical models employed in numerous application domains. However, inferring the network's graphical structure from data remains challenging. Bayesian structure learners approach this problem by inferring a posterior distribution over the possible directed acyclic graphs underlying the BN. The inference process often requires marginalizing over probability distributions, which is typically done using dynamic programming methods that restrict the set of possible parents for each node. Instead, we present a novel method that utilizes tractable probabilistic circuits to circumvent this restriction. This method utilizes a new learning routine that trains these circuits on both the original distribution and marginal queries. The architecture of probabilistic circuits then inherently allows for fast and exact marginalization on the learned distribution. We then show empirically that utilizing our method to answer marginals allows Bayesian structure learners to improve their performance compared to current methods.

How to Marginalize in Causal Structure Learning?

TL;DR

This paper addresses the computational bottleneck in Bayesian structure learning caused by marginalization over an enormous graph space. It introduces probabilistic circuits (PCs) as tractable surrogates that can be trained to approximate Bayesian scores and enable fast, exact marginalization for marginal/zero queries, thereby removing restrictive limits on the number of candidate parents. The authors propose a two-phase PC learning procedure and validate their method within the TRUST framework, demonstrating performance comparable to exact DP under full candidate sets and improvements when candidate sets are restricted. The approach offers scalable marginalization for structure learning and has potential to extend to denser graphs and other BN mechanisms, improving posterior quality and inference efficiency in graphical models.

Abstract

Bayesian networks (BNs) are a widely used class of probabilistic graphical models employed in numerous application domains. However, inferring the network's graphical structure from data remains challenging. Bayesian structure learners approach this problem by inferring a posterior distribution over the possible directed acyclic graphs underlying the BN. The inference process often requires marginalizing over probability distributions, which is typically done using dynamic programming methods that restrict the set of possible parents for each node. Instead, we present a novel method that utilizes tractable probabilistic circuits to circumvent this restriction. This method utilizes a new learning routine that trains these circuits on both the original distribution and marginal queries. The architecture of probabilistic circuits then inherently allows for fast and exact marginalization on the learned distribution. We then show empirically that utilizing our method to answer marginals allows Bayesian structure learners to improve their performance compared to current methods.

Paper Structure

This paper contains 15 sections, 9 equations, 1 figure, 1 algorithm.

Figures (1)

  • Figure 1: Evaluation running TRUST between the exact DP marginalization and regression circuits Top: Comparison on $d = 16$. The DP marginalization here considers all 15 other nodes, so marginals are calculated exactly. Our approximate regression circuit performs comparably to the DP, except for E-SHD. Bottom: Comparison on $d = 20$. We now restrict the DP to considering only $8$ nodes, as is normal practice. As expected, our PC performs considerably better now on all metrics except E-SHD.