Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

Shivvrat Arya; Tahrima Rahman; Vibhav Gogate

Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

Shivvrat Arya, Tahrima Rahman, Vibhav Gogate

TL;DR

The key idea in the approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones.

Abstract

We propose a self-supervised learning approach for solving the following constrained optimization task in log-linear models or Markov networks. Let $f$ and $g$ be two log-linear models defined over the sets $\mathbf{X}$ and $\mathbf{Y}$ of random variables respectively. Given an assignment $\mathbf{x}$ to all variables in $\mathbf{X}$ (evidence) and a real number $q$, the constrained most-probable explanation (CMPE) task seeks to find an assignment $\mathbf{y}$ to all variables in $\mathbf{Y}$ such that $f(\mathbf{x}, \mathbf{y})$ is maximized and $g(\mathbf{x}, \mathbf{y})\leq q$. In our proposed self-supervised approach, given assignments $\mathbf{x}$ to $\mathbf{X}$ (data), we train a deep neural network that learns to output near-optimal solutions to the CMPE problem without requiring access to any pre-computed solutions. The key idea in our approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones. We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.

Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

TL;DR

Abstract

We propose a self-supervised learning approach for solving the following constrained optimization task in log-linear models or Markov networks. Let

and

be two log-linear models defined over the sets

and

of random variables respectively. Given an assignment

to all variables in

(evidence) and a real number

, the constrained most-probable explanation (CMPE) task seeks to find an assignment

to all variables in

such that

is maximized and

. In our proposed self-supervised approach, given assignments

(data), we train a deep neural network that learns to output near-optimal solutions to the CMPE problem without requiring access to any pre-computed solutions. The key idea in our approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones. We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.

Paper Structure (28 sections, 1 theorem, 26 equations, 5 figures, 17 tables)

This paper contains 28 sections, 1 theorem, 26 equations, 5 figures, 17 tables.

INTRODUCTION
Notation and Background
Constrained Most Probable Explanation
Specialized Lower Bounding Algorithms
Solving CMPE using Methods from the Learning to Optimize Literature
Supervised Methods
Self-Supervised Methods
Drawbacks of the Penalty and ALM Methods
A NOVEL SELF-SUPERVISED CMPE SOLVER
Making The Loss Function Smooth and Continuous
EXPERIMENTAL EVALUATION
The Loss Functions: Competing Methods
Datasets and Benchmarks
High Tree-Width Markov Networks and Tractable Probabilistic Circuits
Adversarial Modification on the MNIST Dataset
...and 13 more sections

Key Result

Proposition 4.1

If $\mathcal{L}_\mathbf{x}(\hat{\mathbf{y}})$ is consistent, i.e., $\alpha_\mathbf{x} > \frac{p^*_\mathbf{x}}{q^*_\mathbf{x}}$ then $\min_{\hat{\mathbf{y}}}\mathcal{L}_\mathbf{x}(\hat{\mathbf{y}})=p_\mathbf{x}^*$, namely $\mathcal{L}_\mathbf{x}(\hat{\mathbf{y}})$ is an optimal loss function.

Figures (5)

Figure 1: Two Markov networks $\mathcal{M}_1$ and $\mathcal{M}_2$ having the same chain-like structure and defined over the same set $\{X_1,X_2,Y_1,Y_2\}$ of variables . $\mathcal{M}_1$ is defined by the set of log-potentials $\{h_1,h_2,h_3\}$ and $\mathcal{M}_2$ is defined by the set of log-potentials $\{t_1,t_2,t_3\}$. Each log-potential can be expressed as a local multilinear polynomial function. The global multilinear function representing $\mathcal{M}_1$ and $\mathcal{M}_2$ are $h(x_1,x_2,y_1,y_2) = 18-3x_1+x_2-7y_1-y_2+5x_1y_1-4x_2y_2-y_1y_2$ and $t(x_1,x_2,y_1,y_2) = 28+2x_1-7x_2-4y_1-2y_2+2x_1y_1-x_2y_2$ respectively which are obtained by adding the local functions associated with the respective models and then simplifying, i.e., $h(x_1,x_2,y_1,y_2) = h_1(x_1,y_1)+h_2(y_1,y_2)+h_3(x_2,y_2)$. $t(x_1,x_2,y_1,y_2)$ is obtained similarly.
Figure 2: Optimality Gap (avg %) and Average Violations for Self-Supervised methods
Figure 3: Qualitative results on the adversarially generated MNIST digits. Each row represents an original image followed by a corresponding image generated adversarially by 8 different methods: ILP, MSE, SL+Penalty, MAE, MAE+Penalty, SSL$_{pen}$, PDL, and SS-CMPE.
Figure SF4: Illustration of the optimality gap for self-supervised methods (on feasible examples only) for all approaches. Lower is better.
Figure SF5: Visualization of Optimality Gap (average %) and Average Violations for Self-Supervised Methods across different q values. Points closer to the origin indicate better performance.

Theorems & Definitions (4)

Example 1
Example 2
Proposition 4.1
proof

Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

TL;DR

Abstract

Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)