Structural Refinement of Bayesian Networks for Efficient Model Parameterisation
Kieran Drury, Martine J. Barons, Jim Q. Smith
TL;DR
This paper tackles the data-scarcity challenge in Bayesian network parameterisation by surveying and empirically evaluating structural CPT approximation methods. It compares edge pruning, divorcing, Simple Canonical Models, ICI, and SICI on a cardiovascular BN example, using total variation distance to measure CPT fidelity while tracking parameter savings. The key finding is that SICI offers the best fit by effectively capturing interactions, though it comes with higher structural complexity and parameter count; pruning and divorcing provide strong, practical reductions with acceptable fidelity, while SCMs are the least flexible and often least faithful. The work delivers practical guidance for BN practitioners facing limited data, highlighting when to prune, divorce, or apply causal-influence based refinements, and it underscores the value of transparency and explainability in modified BN structures.
Abstract
Many Bayesian network modelling applications suffer from the issue of data scarcity. Hence the use of expert judgement often becomes necessary to determine the parameters of the conditional probability tables (CPTs) throughout the network. There are usually a prohibitively large number of these parameters to determine, even when complementing any available data with expert judgements. To address this challenge, a number of CPT approximation methods have been developed that reduce the quantity and complexity of parameters needing to be determined to fully parameterise a Bayesian network. This paper provides a review of a variety of structural refinement methods that can be used in practice to efficiently approximate a CPT within a Bayesian network. We not only introduce and discuss the intrinsic properties and requirements of each method, but we evaluate each method through a worked example on a Bayesian network model of cardiovascular risk assessment. We conclude with practical guidance to help Bayesian network practitioners choose an alternative approach when direct parameterisation of a CPT is infeasible.
