Table of Contents
Fetching ...

Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

Shivvrat Arya, Tahrima Rahman, Vibhav Gogate

TL;DR

The key idea in the approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones.

Abstract

We propose a self-supervised learning approach for solving the following constrained optimization task in log-linear models or Markov networks. Let $f$ and $g$ be two log-linear models defined over the sets $\mathbf{X}$ and $\mathbf{Y}$ of random variables respectively. Given an assignment $\mathbf{x}$ to all variables in $\mathbf{X}$ (evidence) and a real number $q$, the constrained most-probable explanation (CMPE) task seeks to find an assignment $\mathbf{y}$ to all variables in $\mathbf{Y}$ such that $f(\mathbf{x}, \mathbf{y})$ is maximized and $g(\mathbf{x}, \mathbf{y})\leq q$. In our proposed self-supervised approach, given assignments $\mathbf{x}$ to $\mathbf{X}$ (data), we train a deep neural network that learns to output near-optimal solutions to the CMPE problem without requiring access to any pre-computed solutions. The key idea in our approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones. We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.

Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models

TL;DR

The key idea in the approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones.

Abstract

We propose a self-supervised learning approach for solving the following constrained optimization task in log-linear models or Markov networks. Let and be two log-linear models defined over the sets and of random variables respectively. Given an assignment to all variables in (evidence) and a real number , the constrained most-probable explanation (CMPE) task seeks to find an assignment to all variables in such that is maximized and . In our proposed self-supervised approach, given assignments to (data), we train a deep neural network that learns to output near-optimal solutions to the CMPE problem without requiring access to any pre-computed solutions. The key idea in our approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones. We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.
Paper Structure (28 sections, 1 theorem, 26 equations, 5 figures, 17 tables)

This paper contains 28 sections, 1 theorem, 26 equations, 5 figures, 17 tables.

Key Result

Proposition 4.1

If $\mathcal{L}_\mathbf{x}(\hat{\mathbf{y}})$ is consistent, i.e., $\alpha_\mathbf{x} > \frac{p^*_\mathbf{x}}{q^*_\mathbf{x}}$ then $\min_{\hat{\mathbf{y}}}\mathcal{L}_\mathbf{x}(\hat{\mathbf{y}})=p_\mathbf{x}^*$, namely $\mathcal{L}_\mathbf{x}(\hat{\mathbf{y}})$ is an optimal loss function.

Figures (5)

  • Figure 1: Two Markov networks $\mathcal{M}_1$ and $\mathcal{M}_2$ having the same chain-like structure and defined over the same set $\{X_1,X_2,Y_1,Y_2\}$ of variables . $\mathcal{M}_1$ is defined by the set of log-potentials $\{h_1,h_2,h_3\}$ and $\mathcal{M}_2$ is defined by the set of log-potentials $\{t_1,t_2,t_3\}$. Each log-potential can be expressed as a local multilinear polynomial function. The global multilinear function representing $\mathcal{M}_1$ and $\mathcal{M}_2$ are $h(x_1,x_2,y_1,y_2) = 18-3x_1+x_2-7y_1-y_2+5x_1y_1-4x_2y_2-y_1y_2$ and $t(x_1,x_2,y_1,y_2) = 28+2x_1-7x_2-4y_1-2y_2+2x_1y_1-x_2y_2$ respectively which are obtained by adding the local functions associated with the respective models and then simplifying, i.e., $h(x_1,x_2,y_1,y_2) = h_1(x_1,y_1)+h_2(y_1,y_2)+h_3(x_2,y_2)$. $t(x_1,x_2,y_1,y_2)$ is obtained similarly.
  • Figure 2: Optimality Gap (avg %) and Average Violations for Self-Supervised methods
  • Figure 3: Qualitative results on the adversarially generated MNIST digits. Each row represents an original image followed by a corresponding image generated adversarially by 8 different methods: ILP, MSE, SL+Penalty, MAE, MAE+Penalty, SSL$_{pen}$, PDL, and SS-CMPE.
  • Figure SF4: Illustration of the optimality gap for self-supervised methods (on feasible examples only) for all approaches. Lower is better.
  • Figure SF5: Visualization of Optimality Gap (average %) and Average Violations for Self-Supervised Methods across different q values. Points closer to the origin indicate better performance.

Theorems & Definitions (4)

  • Example 1
  • Example 2
  • Proposition 4.1
  • proof