A Note on Bayesian Networks with Latent Root Variables
Marco Zaffalon, Alessandro Antonucci
TL;DR
This note addresses learning in Bayesian networks with latent root variables, where incomplete observations create nonconvex likelihood surfaces and risk of local maxima. It shows that marginalising latent roots yields an empirical BN over manifest variables with $P(\bm{Z})=\prod_{Z\in\bm{Z}} P(z|\bm{w}_Z)$, and constructs an auxiliary-root transformation to connect the original and empirical models. The main result proves that the global maximum of the original log-likelihood is bounded by, and achieves equality with, the empirical maximum $\lambda^*$ precisely when the data are compatible with the original BN; this provides a principled compatibility test for EM-based learning. The findings offer a practical criterion to certify global optimality in the presence of latent roots and indicate directions for extending the framework to continuous variables.
Abstract
We characterise the likelihood function computed from a Bayesian network with latent variables as root nodes. We show that the marginal distribution over the remaining, manifest, variables also factorises as a Bayesian network, which we call empirical. A dataset of observations of the manifest variables allows us to quantify the parameters of the empirical Bayesian net. We prove that (i) the likelihood of such a dataset from the original Bayesian network is dominated by the global maximum of the likelihood from the empirical one; and that (ii) such a maximum is attained if and only if the parameters of the Bayesian network are consistent with those of the empirical model.
