Table of Contents
Fetching ...

Top-Down Bayesian Posterior Sampling for Sum-Product Networks

Soma Yokoi, Issei Sato

TL;DR

This work tackles the computational bottleneck of Bayesian learning in Sum-Product Networks (SPNs) by deriving a marginalized posterior that integrates out $\mathbf{W}$ and $\bm{\Theta}$ and by introducing a top-down Metropolis–Hastings sampling algorithm for $\mathbf{Z}$. The key idea is to sample latent structure $\mathbf{Z}$ efficiently via a network-proposal step and a leaf-acceptance step, reducing learning-time complexity to $O(C_s^{\log_{C_p}D})$ and achieving tens-to-hundreds of times speedups while preserving or improving predictive performance. An empirical Bayesian hyperparameter-tuning scheme is proposed to manage large hyperparameter counts by subsampling per-dimension leaf data, reducing complexity from the number of leaves to $O(D)$. Across more than 20 datasets, the method demonstrates substantial speedups and competitive or superior predictive accuracy, enabling scalable Bayesian SPN learning for large-scale applications.

Abstract

Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased learning-time complexity and can be an obstacle to building highly expressive SPNs. This study aimed to develop a Bayesian learning approach that can be efficiently implemented on large-scale SPNs. We derived a new full conditional probability of Gibbs sampling by marginalizing multiple random variables to expeditiously obtain the posterior distribution. The complexity analysis revealed that our sampling algorithm works efficiently even for the largest possible SPN. Furthermore, we proposed a hyperparameter tuning method that balances the diversity of the prior distribution and optimization efficiency in large-scale SPNs. Our method has improved learning-time complexity and demonstrated computational speed tens to more than one hundred times faster and superior predictive performance in numerical experiments on more than 20 datasets.

Top-Down Bayesian Posterior Sampling for Sum-Product Networks

TL;DR

This work tackles the computational bottleneck of Bayesian learning in Sum-Product Networks (SPNs) by deriving a marginalized posterior that integrates out and and by introducing a top-down Metropolis–Hastings sampling algorithm for . The key idea is to sample latent structure efficiently via a network-proposal step and a leaf-acceptance step, reducing learning-time complexity to and achieving tens-to-hundreds of times speedups while preserving or improving predictive performance. An empirical Bayesian hyperparameter-tuning scheme is proposed to manage large hyperparameter counts by subsampling per-dimension leaf data, reducing complexity from the number of leaves to . Across more than 20 datasets, the method demonstrates substantial speedups and competitive or superior predictive accuracy, enabling scalable Bayesian SPN learning for large-scale applications.

Abstract

Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased learning-time complexity and can be an obstacle to building highly expressive SPNs. This study aimed to develop a Bayesian learning approach that can be efficiently implemented on large-scale SPNs. We derived a new full conditional probability of Gibbs sampling by marginalizing multiple random variables to expeditiously obtain the posterior distribution. The complexity analysis revealed that our sampling algorithm works efficiently even for the largest possible SPN. Furthermore, we proposed a hyperparameter tuning method that balances the diversity of the prior distribution and optimization efficiency in large-scale SPNs. Our method has improved learning-time complexity and demonstrated computational speed tens to more than one hundred times faster and superior predictive performance in numerical experiments on more than 20 datasets.
Paper Structure (24 sections, 15 equations, 5 figures, 12 tables, 2 algorithms)

This paper contains 24 sections, 15 equations, 5 figures, 12 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of SPNs. Left: computational graph representing the structure and weights of the SPN. Center: evaluation of the density of input data ${\bm{x}}$. Right: structural constraints ensuring tractability.
  • Figure 2: Bayesian SPNs. Left: plate notation showing the conditional dependencies of random variables based on the Bayesian interpretation of SPNs. Center: entire graph traversal for ancestral sampling in the bottom-up approach. Right: subgraph access for rejection sampling in the proposed top-down approach.
  • Figure 3: Example of the SPN for $D=2$ input data. Left: the largest computational graph with $C_s$ children for each sum node. Right: graph breadth and expressivity on non-linearly correlated examples.
  • Figure 4: Temporal evolution in predictive performance for each dataset. The average test-set log-likelihood over $10$ trials is shown with the lines, and the standard deviation is indicated by the shaded regions.
  • Figure 5: Temporal evolution in predictive performance for each dataset. The average test-set log-likelihood over $10$ trials is shown with the lines, and the standard deviation is indicated by the shaded regions.