Table of Contents
Fetching ...

Variational Inference on the Boolean Hypercube with the Quantum Entropy

Eliot Beyler, Francis Bach

TL;DR

This work tackles the problem of bounding the log-partition function for pairwise MRFs on the Boolean hypercube by leveraging quantum entropy relaxations of the KL divergence. It derives a tractable upper bound via a primal-dual Chambolle–Pock algorithm on moment-matrix relaxations and introduces hierarchies of relaxations through feature-augmentation (SoS-like) with a greedy scheme to select features. The approach achieves competitive performance against state-of-the-art log-determinant and TRW bounds across several graph structures and parameter regimes, and provides clear pathways to tighter bounds and extensions to other domains. The combination of quantum entropy relaxations, structured hierarchies, and scalable optimization offers a flexible framework for variational inference in high-dimensional discrete graphical models with practical implications for inference and optimization tasks.

Abstract

In this paper, we derive variational inference upper-bounds on the log-partition function of pairwise Markov random fields on the Boolean hypercube, based on quantum relaxations of the Kullback-Leibler divergence. We then propose an efficient algorithm to compute these bounds based on primal-dual optimization. An improvement of these bounds through the use of ''hierarchies,'' similar to sum-of-squares (SoS) hierarchies is proposed, and we present a greedy algorithm to select among these relaxations. We carry extensive numerical experiments and compare with state-of-the-art methods for this inference problem.

Variational Inference on the Boolean Hypercube with the Quantum Entropy

TL;DR

This work tackles the problem of bounding the log-partition function for pairwise MRFs on the Boolean hypercube by leveraging quantum entropy relaxations of the KL divergence. It derives a tractable upper bound via a primal-dual Chambolle–Pock algorithm on moment-matrix relaxations and introduces hierarchies of relaxations through feature-augmentation (SoS-like) with a greedy scheme to select features. The approach achieves competitive performance against state-of-the-art log-determinant and TRW bounds across several graph structures and parameter regimes, and provides clear pathways to tighter bounds and extensions to other domains. The combination of quantum entropy relaxations, structured hierarchies, and scalable optimization offers a flexible framework for variational inference in high-dimensional discrete graphical models with practical implications for inference and optimization tasks.

Abstract

In this paper, we derive variational inference upper-bounds on the log-partition function of pairwise Markov random fields on the Boolean hypercube, based on quantum relaxations of the Kullback-Leibler divergence. We then propose an efficient algorithm to compute these bounds based on primal-dual optimization. An improvement of these bounds through the use of ''hierarchies,'' similar to sum-of-squares (SoS) hierarchies is proposed, and we present a greedy algorithm to select among these relaxations. We carry extensive numerical experiments and compare with state-of-the-art methods for this inference problem.

Paper Structure

This paper contains 48 sections, 62 equations, 15 figures, 2 algorithms.

Figures (15)

  • Figure 1: Comparison of the different quantum relaxations of the KL divergence. $d=5$, Gaussian parameters. x-axis corresponds to the number of features in the feature vector. Mean over 10 draws of parameters, $\pm1$ standard deviation.
  • Figure 2: Comparison with competing algorithm for log-det parameters and varying coupling strength. $d = 5$, complete graph. Mean over 10 draws of parameters, $\pm1$ standard deviation.
  • Figure 3: Comparison with competing algorithm for log-det parameters and varying coupling strength. $d = 16$, complete graph. Mean over 10 draws of parameters, $\pm1$ standard deviation.
  • Figure 4: Comparison of the different quantum relaxations of the KL divergence. $d=5$, Gaussian parameters. x-axis corresponds to the number of features in the feature vector. Mean over 10 draws of parameters, $\pm1$ standard deviation.
  • Figure 5: Comparison with competing algorithm for TRW parameters and varying coupling strength. $d=5$, complete graph. Mean over 10 draws of parameters, $\pm1$ standard deviation.
  • ...and 10 more figures