Table of Contents
Fetching ...

On the Convexity and Reliability of the Bethe Free Energy Approximation

Harald Leisenberger, Christian Knoll, Franz Pernkopf

Abstract

The Bethe free energy approximation provides an effective way for relaxing NP-hard problems of probabilistic inference. However, its accuracy depends on the model parameters and particularly degrades if a phase transition in the model occurs. In this work, we analyze when the Bethe approximation is reliable and how this can be verified. We argue and show by experiment that it is mostly accurate if it is convex on a submanifold of its domain, the 'Bethe box'. For verifying its convexity, we derive two sufficient conditions that are based on the definiteness properties of the Bethe Hessian matrix: the first uses the concept of diagonal dominance, and the second decomposes the Bethe Hessian matrix into a sum of sparse matrices and characterizes the definiteness properties of the individual matrices in that sum. These theoretical results provide a simple way to estimate the critical phase transition temperature of a model. As a practical contribution we propose $\texttt{BETHE-MIN}$, a projected quasi-Newton method to efficiently find a minimum of the Bethe free energy.

On the Convexity and Reliability of the Bethe Free Energy Approximation

Abstract

The Bethe free energy approximation provides an effective way for relaxing NP-hard problems of probabilistic inference. However, its accuracy depends on the model parameters and particularly degrades if a phase transition in the model occurs. In this work, we analyze when the Bethe approximation is reliable and how this can be verified. We argue and show by experiment that it is mostly accurate if it is convex on a submanifold of its domain, the 'Bethe box'. For verifying its convexity, we derive two sufficient conditions that are based on the definiteness properties of the Bethe Hessian matrix: the first uses the concept of diagonal dominance, and the second decomposes the Bethe Hessian matrix into a sum of sparse matrices and characterizes the definiteness properties of the individual matrices in that sum. These theoretical results provide a simple way to estimate the critical phase transition temperature of a model. As a practical contribution we propose , a projected quasi-Newton method to efficiently find a minimum of the Bethe free energy.
Paper Structure (18 sections, 23 theorems, 189 equations, 14 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 23 theorems, 189 equations, 14 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

$\mathcal{F}_B$ is convex on $\mathbb{L}$, if and only if the graph $\mathbf{G}$ is either a tree or contains precisely one cycle.

Figures (14)

  • Figure 1: Summary of the current state of knowledge about the relationships between the Bethe free energy and loopy belief propagation. The white rectangles contain various properties of the Bethe free energy and LBP. The main results of this work are contained in the blue ellipse and serve as sufficient conditions for the convexity of the Bethe free energy. Sufficient conditions from previous research are included in the green ellipses. Other important results are shown as red label next to the logical arrows between properties.
  • Figure 2: Local polytope $\mathbb{L}$ (left) on a single-edge graph (right).
  • Figure 3: Bethe box $\mathbb{B} \subset \mathbb{L}$ for different values of coupling $J_{ij}$ and inverse temperature $\beta$ on a single-edge graph. For a ferromagnetic edge ($J_{ij}> 0$; first row), $\mathbb{B}$ approaches the upper boundary of $\mathbb{L}$ as $\beta$ increases; for an antiferromagnetic edge ($J_{ij}< 0$; second row), $\mathbb{B}$ approaches the lower boundary of $\mathbb{L}$ as $\beta$ increases.
  • Figure 4: Dependencies and components of the function $R_i(\bm{q})$ associated to some node $i$. Ferromagnetic edges are drawn in green and antiferromagnetic edges are drawn in red.
  • Figure 5: $r_{ij}^{+}(q_i,q_j)$ as a function of $q_j \in (0,1)$ and for different fixed values of $q_i$. The green line $q_j = 1-q_i$ represents the space of all stationary points of $r_{ij}^{+}(q_i,q_j)$ (exactly one for each $q_i \in (0,1)$, Theorem \ref{['thm:unique_stationary_rijp']}), which are always maxima (drawn as red points, Theorem \ref{['thm:maximum_rijp']}). The blue points at the boundary represent the infima of $r_{ij}^{+}(q_i,q_j)$ with respect to $q_j$, which are never taken by $r_{ij}^{+}(q_i,q_j)$ (Lemma \ref{['lemma:rijp_critical_values']} and Corollary \ref{['cor:solution_inf_rijp_qj']}). Note that $q_i=0.5$ represents the only scenario where the infimum exists at both boundary points $0$ and $1$.
  • ...and 9 more figures

Theorems & Definitions (25)

  • Proposition 1: Corollary 2 in watanabe2009graphzeta
  • Theorem 2: Adopted from welling2001belief
  • Corollary 3
  • Lemma 4: (a): welling2001belief; (b): Lemma 9 in weller2013bethebounds
  • Corollary 5
  • Lemma 6: Lemma 2 in weller2013bethebounds
  • Proposition 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 15 more