Table of Contents
Fetching ...

Maximum a Posteriori Inference for Factor Graphs via Benders' Decomposition

Harsh Vardhan Dubey, Ji Ah Lee, Patrick Flaherty

TL;DR

This work presents a method for maximum a-posteriori inference in general Bayesian factor models that sequentially adds constraints to the fully relaxed dual problem using Benders' decomposition and produces a higher optimal posterior value compared to Gibbs sampling and variational Bayes methods for standard data sets and provides certificate of convergence.

Abstract

Many Bayesian statistical inference problems come down to computing a maximum a-posteriori (MAP) assignment of latent variables. Yet, standard methods for estimating the MAP assignment do not have a finite time guarantee that the algorithm has converged to a fixed point. Previous research has found that MAP inference can be represented in dual form as a linear programming problem with a non-polynomial number of constraints. A Lagrangian relaxation of the dual yields a statistical inference algorithm as a linear programming problem. However, the decision as to which constraints to remove in the relaxation is often heuristic. We present a method for maximum a-posteriori inference in general Bayesian factor models that sequentially adds constraints to the fully relaxed dual problem using Benders' decomposition. Our method enables the incorporation of expressive integer and logical constraints in clustering problems such as must-link, cannot-link, and a minimum number of whole samples allocated to each cluster. Using this approach, we derive MAP estimation algorithms for the Bayesian Gaussian mixture model and latent Dirichlet allocation. Empirical results show that our method produces a higher optimal posterior value compared to Gibbs sampling and variational Bayes methods for standard data sets and provides certificate of convergence.

Maximum a Posteriori Inference for Factor Graphs via Benders' Decomposition

TL;DR

This work presents a method for maximum a-posteriori inference in general Bayesian factor models that sequentially adds constraints to the fully relaxed dual problem using Benders' decomposition and produces a higher optimal posterior value compared to Gibbs sampling and variational Bayes methods for standard data sets and provides certificate of convergence.

Abstract

Many Bayesian statistical inference problems come down to computing a maximum a-posteriori (MAP) assignment of latent variables. Yet, standard methods for estimating the MAP assignment do not have a finite time guarantee that the algorithm has converged to a fixed point. Previous research has found that MAP inference can be represented in dual form as a linear programming problem with a non-polynomial number of constraints. A Lagrangian relaxation of the dual yields a statistical inference algorithm as a linear programming problem. However, the decision as to which constraints to remove in the relaxation is often heuristic. We present a method for maximum a-posteriori inference in general Bayesian factor models that sequentially adds constraints to the fully relaxed dual problem using Benders' decomposition. Our method enables the incorporation of expressive integer and logical constraints in clustering problems such as must-link, cannot-link, and a minimum number of whole samples allocated to each cluster. Using this approach, we derive MAP estimation algorithms for the Bayesian Gaussian mixture model and latent Dirichlet allocation. Empirical results show that our method produces a higher optimal posterior value compared to Gibbs sampling and variational Bayes methods for standard data sets and provides certificate of convergence.

Paper Structure

This paper contains 40 sections, 4 theorems, 46 equations, 8 figures, 6 tables.

Key Result

Theorem 1

If the posterior is convex in the latent variables, maximum a-posteriori inference for Bayesian models in factor graph form satisfies the conditions of generalized Benders' decomposition, and thus global optimality can be achieved.

Figures (8)

  • Figure 1: Graphical model representation of the BGMM model.
  • Figure 2: Factor graph representation of the BGMM model.
  • Figure 3: Graphical model representation of the smoothed LDA model.
  • Figure 4: Factor graph representation of the smoothed LDA model.
  • Figure 5: Trace plots of sampled mixture weights from the Gibbs sampler for all three data sets
  • ...and 3 more figures

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • Corollary 1
  • Lemma 1
  • Lemma 2