Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

Marco Scutari

Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

Marco Scutari

TL;DR

This work addresses the computational bottlenecks of computing information‑theoretic quantities for Bayesian networks, notably Shannon entropy and KL divergence, under discrete, Gaussian, and conditional linear Gaussian assumptions. By exploiting BN graphical structure and local distributions, it derives efficient exact expressions for entropy and KL in Gaussian and CLGBNs, and analyzes the associated complexity across BN families. It shows that KL computation for Gaussian BNs can be performed approximately in $O(N^2)$ rather than $O(N^3)$, while discrete BNs remain challenging due to non‑orthogonal entropy decomposition and require junction‑tree based methods. The results provide practical, scalable tools for BN‑based analysis and interpretation, with step‑by‑step numeric examples and an implementation in the bnlearn R package, enabling exact or near‑exact information‑theoretic measures in real‑world applications.

Abstract

Bayesian networks (BNs) are a foundational model in machine learning and causal inference. Their graphical structure can handle high-dimensional problems, divide them into a sparse collection of smaller ones, underlies Judea Pearl's causality, and determines their explainability and interpretability. Despite their popularity, there are almost no resources in the literature on how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for BNs under their most common distributional assumptions. In this paper, we provide computationally efficient algorithms for both by leveraging BNs' graphical structure, and we illustrate them with a complete set of numerical examples. In the process, we show it is possible to reduce the computational complexity of KL from cubic to quadratic for Gaussian BNs.

Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

TL;DR

rather than

, while discrete BNs remain challenging due to non‑orthogonal entropy decomposition and require junction‑tree based methods. The results provide practical, scalable tools for BN‑based analysis and interpretation, with step‑by‑step numeric examples and an implementation in the bnlearn R package, enabling exact or near‑exact information‑theoretic measures in real‑world applications.

Abstract

Paper Structure (9 sections, 29 equations, 1 figure)

This paper contains 9 sections, 29 equations, 1 figure.

Introduction
Bayesian Networks
Common Distributional Assumptions for Bayesian Networks
Discrete BNs
Gaussian BNs
Conditional Linear Gaussian BNs
Inference
Shannon Entropy and Kullback-Leibler Divergence
Discrete BNs

Figures (1)

Figure S1: DAGs and local distributions for the GBNs $\mathcal{B}$ (top) and $\mathcal{B}'$ (bottom) used in Examples \ref{['ex:gbn']}, \ref{['ex:gbn-h']}, \ref{['ex:mvnorm-kl']}, \ref{['ex:gbn-kl']} and \ref{['ex:gbn-approx-kl']}.

Theorems & Definitions (4)

Example 1: Composing and decomposing a GBN
Example 2: Composing and decomposing a CLGBN
Example 3: Entropy of a discrete BN
Example 4: KL between two discrete BNs

Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

TL;DR

Abstract

Entropy and the Kullback-Leibler Divergence for Bayesian Networks: Computational Complexity and Efficient Implementation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (4)