Neural Network Approximators for Marginal MAP in Probabilistic Circuits

Shivvrat Arya; Tahrima Rahman; Vibhav Gogate

Neural Network Approximators for Marginal MAP in Probabilistic Circuits

Shivvrat Arya, Tahrima Rahman, Vibhav Gogate

TL;DR

This work proposes an approach that uses neural networks to approximate MMAP inference in PCs using a continuous multilinear function and shows that it outperforms three competing linear time approximations, which are used in practice to solve MMAP tasks in PCs.

Abstract

Probabilistic circuits (PCs) such as sum-product networks efficiently represent large multi-variate probability distributions. They are preferred in practice over other probabilistic representations such as Bayesian and Markov networks because PCs can solve marginal inference (MAR) tasks in time that scales linearly in the size of the network. Unfortunately, the maximum-a-posteriori (MAP) and marginal MAP (MMAP) tasks remain NP-hard in these models. Inspired by the recent work on using neural networks for generating near-optimal solutions to optimization problems such as integer linear programming, we propose an approach that uses neural networks to approximate (M)MAP inference in PCs. The key idea in our approach is to approximate the cost of an assignment to the query variables using a continuous multilinear function, and then use the latter as a loss function. The two main benefits of our new method are that it is self-supervised and after the neural network is learned, it requires only linear time to output a solution. We evaluate our new approach on several benchmark datasets and show that it outperforms three competing linear time approximations, max-product inference, max-marginal inference and sequential estimation, which are used in practice to solve MMAP tasks in PCs.

Neural Network Approximators for Marginal MAP in Probabilistic Circuits

TL;DR

Abstract

Paper Structure (29 sections, 1 theorem, 9 equations, 8 figures, 6 tables)

This paper contains 29 sections, 1 theorem, 9 equations, 8 figures, 6 tables.

Introduction
Preliminaries
Probabilistic Circuits
Marginal Inference in PCs
Marginal Maximum-a-Posteriori (MMAP) Inference in PCs
A Neural Optimizer for MMAP in PCs
A Self-Supervised Loss Function for PCs
Tractable Gradient Computation
Improving the Loss Function
Experiments
Competing Methods
Evaluation Criteria
Datasets and Probabilistic Circuits
Neural Network Optimizers
Results on the TPM Datasets
...and 14 more sections

Key Result

Proposition 1

The gradient of the loss function $-\ln v'(r,(\mathbf{e},\mathbf{q}^c))$ w.r.t. $\mathbf{q}_j^c$ can be computed in time and space that scales linearly with the size of $\mathcal{M}$.

Figures (8)

Figure 1: (a) An example smooth and decomposable PC. The figure also shows value computation for answering the query $\text{p}_\mathcal{M}(X_3=1,X_4=0)$. The values of the leaf, sum, and product nodes are given in parentheses on their bottom, top, and left, respectively. The value of the root node is the answer to the query. (b) QPC obtained from the PC given in (a) for query variables $\{X_3,X_4\}$. For simplicity, here, we use an MMAP problem without any evidence. This is because a given evidence can be incorporated into the PC by appropriately setting the leaf nodes. We also show value computations for the following leaf initialization: $X_3^c=0.99,\neg X_3^c=0.01, X_4^c=0.05,\neg X_4^c=0.95$ and all other leaves are set to $1$.
Figure 2: Heat map showing the % difference in log-likelihood scores between SSMP and Max approximation. Each row denotes a distinct dataset, with the color gradient depicting the % Difference. The gradient extends from dark blue to light blue, indicating areas where Max is superior (negative values), and from light red to dark red, highlighting regions where SSMP outperforms (positive values).
Figure 3: (a) Value computations for partial derivative of the QPC given in Figure 1 in the main paper w.r.t. $X_3^c$ and (b) Value computations for partial derivative of the QPC given in Figure 1 in the main paper w.r.t. $X_4^c$. The values of the leaf, sum and product nodes are given in brackets on their bottom, top and left respectively. The value of the root node equals the partial derivative.
Figure 4: Heatmap illustrating inference time for ML, Seq, Max, and SSMP methods on a Logarithmic micro-second scale. The color green indicates shorter (more favorable) time.
Figure 5: Heatmap showing the percentage difference in log-likelihood scores between SSMP and Supervised Learning method. Blue color represents the supervised method's superiority (negative values), while red color represents SSMP's superiority (positive values). The datasets are arranged in ascending order of their number of variables.
...and 3 more figures

Theorems & Definitions (7)

Definition 1
Example 1
Example 2
Example 3
Proposition 1
Example 4
Example 5

Neural Network Approximators for Marginal MAP in Probabilistic Circuits

TL;DR

Abstract

Neural Network Approximators for Marginal MAP in Probabilistic Circuits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (7)