Top-Down Bayesian Posterior Sampling for Sum-Product Networks
Soma Yokoi, Issei Sato
TL;DR
This work tackles the computational bottleneck of Bayesian learning in Sum-Product Networks (SPNs) by deriving a marginalized posterior that integrates out $\mathbf{W}$ and $\bm{\Theta}$ and by introducing a top-down Metropolis–Hastings sampling algorithm for $\mathbf{Z}$. The key idea is to sample latent structure $\mathbf{Z}$ efficiently via a network-proposal step and a leaf-acceptance step, reducing learning-time complexity to $O(C_s^{\log_{C_p}D})$ and achieving tens-to-hundreds of times speedups while preserving or improving predictive performance. An empirical Bayesian hyperparameter-tuning scheme is proposed to manage large hyperparameter counts by subsampling per-dimension leaf data, reducing complexity from the number of leaves to $O(D)$. Across more than 20 datasets, the method demonstrates substantial speedups and competitive or superior predictive accuracy, enabling scalable Bayesian SPN learning for large-scale applications.
Abstract
Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased learning-time complexity and can be an obstacle to building highly expressive SPNs. This study aimed to develop a Bayesian learning approach that can be efficiently implemented on large-scale SPNs. We derived a new full conditional probability of Gibbs sampling by marginalizing multiple random variables to expeditiously obtain the posterior distribution. The complexity analysis revealed that our sampling algorithm works efficiently even for the largest possible SPN. Furthermore, we proposed a hyperparameter tuning method that balances the diversity of the prior distribution and optimization efficiency in large-scale SPNs. Our method has improved learning-time complexity and demonstrated computational speed tens to more than one hundred times faster and superior predictive performance in numerical experiments on more than 20 datasets.
