Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation
Marco Scutari
TL;DR
This work addresses the computational bottlenecks of computing information‑theoretic quantities for Bayesian networks, notably Shannon entropy and KL divergence, under discrete, Gaussian, and conditional linear Gaussian assumptions. By exploiting BN graphical structure and local distributions, it derives efficient exact expressions for entropy and KL in Gaussian and CLGBNs, and analyzes the associated complexity across BN families. It shows that KL computation for Gaussian BNs can be performed approximately in $O(N^2)$ rather than $O(N^3)$, while discrete BNs remain challenging due to non‑orthogonal entropy decomposition and require junction‑tree based methods. The results provide practical, scalable tools for BN‑based analysis and interpretation, with step‑by‑step numeric examples and an implementation in the bnlearn R package, enabling exact or near‑exact information‑theoretic measures in real‑world applications.
Abstract
Bayesian networks (BNs) are a foundational model in machine learning and causal inference. Their graphical structure can handle high-dimensional problems, divide them into a sparse collection of smaller ones, underlies Judea Pearl's causality, and determines their explainability and interpretability. Despite their popularity, there are almost no resources in the literature on how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for BNs under their most common distributional assumptions. In this paper, we provide computationally efficient algorithms for both by leveraging BNs' graphical structure, and we illustrate them with a complete set of numerical examples. In the process, we show it is possible to reduce the computational complexity of KL from cubic to quadratic for Gaussian BNs.
