Table of Contents
Fetching ...

Mean field initialization of the Annealed Importance Sampling algorithm for an efficient evaluation of the Partition Function of Restricted Boltzmann Machines

A. Prat Pou, E. Romero, J. Martí, F. Mazzanti

TL;DR

This work addresses the challenge of estimating the partition function $Z$ for RBMs using Annealed Importance Sampling (AIS) and shows that the convergence and accuracy of AIS can be significantly improved by selecting an appropriate mean-field starting distribution $p_0(x)$. It derives the KL-optimal starting biases $B^*$ and presents two practical, low-cost approximations—Pinv and Signs_h—that approximate $ig\<xig angle_n$ without training data. Through extensive experiments on Gaussian-weighted RBMs, MNIST-based RBMs, and Ising/Spin Glass models, the authors demonstrate that these mean-field initializations yield more reliable $ ext{log}(Z)$ estimates than the conventional uniform start, often with substantially reduced computational cost. The results support using AIS with mean-field initializations as a robust tool for partition-function evaluation in large-scale RBMs and related binary-state systems.

Abstract

Probabilistic models in physics often require from the evaluation of normalized Boltzmann factors, which in turn implies the computation of the partition function Z. Getting the exact value of Z, though, becomes a forbiddingly expensive task as the system size increases. This problem is also present in probabilistic learning models such as the Restricted Boltzmann Machine (RBM), where the situation is even worse as the exact learning rules implies the computation of Z at each iteration. A possible way to tackle this problem is to use the Annealed Importance Sampling (AIS) algorithm, which provides a tool to stochastically estimate the partition function of the system. So far, the standard application of the AIS algorithm starts from the uniform probability distribution and uses a large number of Monte Carlo steps to obtain reliable estimations of Z following an annealing process. In this work we show that both the quality of the estimation and the cost of the computation can be significantly improved by using a properly selected mean-field starting probability distribution. We perform a systematic analysis of AIS in both small- and large-sized problems, and compare the results to exact values in problems where these are known. As a result of our systematic analysis, we propose two successful strategies that work well in all the problems analyzed. We conclude that these are good starting points to estimate the partition function with AIS with a relatively low computational cost.

Mean field initialization of the Annealed Importance Sampling algorithm for an efficient evaluation of the Partition Function of Restricted Boltzmann Machines

TL;DR

This work addresses the challenge of estimating the partition function for RBMs using Annealed Importance Sampling (AIS) and shows that the convergence and accuracy of AIS can be significantly improved by selecting an appropriate mean-field starting distribution . It derives the KL-optimal starting biases and presents two practical, low-cost approximations—Pinv and Signs_h—that approximate without training data. Through extensive experiments on Gaussian-weighted RBMs, MNIST-based RBMs, and Ising/Spin Glass models, the authors demonstrate that these mean-field initializations yield more reliable estimates than the conventional uniform start, often with substantially reduced computational cost. The results support using AIS with mean-field initializations as a robust tool for partition-function evaluation in large-scale RBMs and related binary-state systems.

Abstract

Probabilistic models in physics often require from the evaluation of normalized Boltzmann factors, which in turn implies the computation of the partition function Z. Getting the exact value of Z, though, becomes a forbiddingly expensive task as the system size increases. This problem is also present in probabilistic learning models such as the Restricted Boltzmann Machine (RBM), where the situation is even worse as the exact learning rules implies the computation of Z at each iteration. A possible way to tackle this problem is to use the Annealed Importance Sampling (AIS) algorithm, which provides a tool to stochastically estimate the partition function of the system. So far, the standard application of the AIS algorithm starts from the uniform probability distribution and uses a large number of Monte Carlo steps to obtain reliable estimations of Z following an annealing process. In this work we show that both the quality of the estimation and the cost of the computation can be significantly improved by using a properly selected mean-field starting probability distribution. We perform a systematic analysis of AIS in both small- and large-sized problems, and compare the results to exact values in problems where these are known. As a result of our systematic analysis, we propose two successful strategies that work well in all the problems analyzed. We conclude that these are good starting points to estimate the partition function with AIS with a relatively low computational cost.
Paper Structure (7 sections, 27 equations, 7 figures)

This paper contains 7 sections, 27 equations, 7 figures.

Figures (7)

  • Figure 1: AIS estimation of $\log(Z)$ starting from ${\bf B}=0$ for the MNIST-20h (left) and ten different GWGM datasets (right) as a function of the number $N_\beta$ of intermediate distributions. The left panel shows both the exact value(in blue) and the AIS estimations, while on the right the ratio of these two quantities is plotted.
  • Figure 2: Percentage of AIS samples producing an estimation of $\log(Z)$ with a relative error of less that $5\%$ with respect to the exact result, obtained starting from ${\bf B}=0$. The results have been averaged over all models of each tested dataset.
  • Figure 3: Percentage of AIS samples producing a relative error lower or equal to $5\%$ with respect to the exact $\log(Z)$ value, as a function of the number of hidden units and inverse temperature. The left and right panels show the results starting from ${\bf B}=0$ and ${\bf B}={\bf B}^*$, respectively.
  • Figure 4: Relative error of all models in the transposed and non-transposed GWGM datasets, computed as in Eq. (\ref{['relative_error']}). For the sake of clarity, the models have been sorted according to the relative error of the non-transposed results.
  • Figure 5: Percentage of AIS samples with a relative error lower than $0.05\%$ with respect to the exact $\log(Z)$ for the different datasets analyzed. The left, middle and right bars with different gray levels correspond to the predictions starting from ${\bf B}=0$, ${\bf B}={\bf B}_{\rm Pinv}$ and ${\bf B}={\bf B}_{\rm Signs\_h}$, respectively.
  • ...and 2 more figures