Table of Contents
Fetching ...

Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

Kentaro Kubo, Hayato Goto

TL;DR

The paper tackles the training bottleneck of energy-based Boltzmann machines by introducing sampler-adaptive learning (SAL), which combines Langevin simulated bifurcation (LSB) for fast parallel sampling with conditional expectation matching (CEM) to estimate the effective inverse temperature during learning. This enables training of semi-restricted Boltzmann machines (SRBMs) and other expressive BMs beyond Restricted Boltzmann Machines (RBMs) by performing gradient-based updates on the KL divergence with a negative phase from LSB samples and a positive phase using β_eff estimated by CEM. Empirical results on a spin-glass model, Bars-and-Stripes images, and OptDigits demonstrate that SAL improves learning efficiency and generative/reconstruction/classification performance, while enabling conditional generation. The framework broadens the practical applicability of energy-based models and points to future extensions with deeper architectures and alternative parallel samplers, along with open theoretical questions about LSB’s probabilistic guarantees and hyperparameter selection.

Abstract

Boltzmann machines (BMs) are powerful energy-based generative models, but their heavy training cost has largely confined practical use to Restricted BMs (RBMs) trained with an efficient learning method called contrastive divergence. More accurate learning typically requires Markov chain Monte Carlo (MCMC) Boltzmann sampling, but it is time-consuming due to the difficulty of parallelization for more expressive models. To address this limitation, we first propose a new Boltzmann sampler inspired by a quantum-inspired combinatorial optimization called simulated bifurcation (SB). This SB-inspired approach, which we name Langevin SB (LSB), enables parallelized sampling while maintaining accuracy comparable to MCMC. Furthermore, this is applicable not only to RBMs but also to BMs with general couplings. However, LSB cannot control the inverse temperature of the output Boltzmann distribution, which hinders learning and degrades performance. To overcome this limitation, we also developed an efficient method for estimating the inverse temperature during the learning process, which we call conditional expectation matching (CEM). By combining LSB and CEM, we establish an efficient learning framework for BMs with greater expressive power than RBMs. We refer to this framework as sampler-adaptive learning (SAL). SAL opens new avenues for energy-based generative modeling beyond RBMs.

Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

TL;DR

The paper tackles the training bottleneck of energy-based Boltzmann machines by introducing sampler-adaptive learning (SAL), which combines Langevin simulated bifurcation (LSB) for fast parallel sampling with conditional expectation matching (CEM) to estimate the effective inverse temperature during learning. This enables training of semi-restricted Boltzmann machines (SRBMs) and other expressive BMs beyond Restricted Boltzmann Machines (RBMs) by performing gradient-based updates on the KL divergence with a negative phase from LSB samples and a positive phase using β_eff estimated by CEM. Empirical results on a spin-glass model, Bars-and-Stripes images, and OptDigits demonstrate that SAL improves learning efficiency and generative/reconstruction/classification performance, while enabling conditional generation. The framework broadens the practical applicability of energy-based models and points to future extensions with deeper architectures and alternative parallel samplers, along with open theoretical questions about LSB’s probabilistic guarantees and hyperparameter selection.

Abstract

Boltzmann machines (BMs) are powerful energy-based generative models, but their heavy training cost has largely confined practical use to Restricted BMs (RBMs) trained with an efficient learning method called contrastive divergence. More accurate learning typically requires Markov chain Monte Carlo (MCMC) Boltzmann sampling, but it is time-consuming due to the difficulty of parallelization for more expressive models. To address this limitation, we first propose a new Boltzmann sampler inspired by a quantum-inspired combinatorial optimization called simulated bifurcation (SB). This SB-inspired approach, which we name Langevin SB (LSB), enables parallelized sampling while maintaining accuracy comparable to MCMC. Furthermore, this is applicable not only to RBMs but also to BMs with general couplings. However, LSB cannot control the inverse temperature of the output Boltzmann distribution, which hinders learning and degrades performance. To overcome this limitation, we also developed an efficient method for estimating the inverse temperature during the learning process, which we call conditional expectation matching (CEM). By combining LSB and CEM, we establish an efficient learning framework for BMs with greater expressive power than RBMs. We refer to this framework as sampler-adaptive learning (SAL). SAL opens new avenues for energy-based generative modeling beyond RBMs.

Paper Structure

This paper contains 23 sections, 21 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Overview of the sampler-adaptive learning (SAL) framework and model structures. (a) Schematic of the SAL framework, which combines Langevin simulated bifurcation (LSB, yellow arrows) for fast parallel sampling and conditional expectation matching (CEM, red arrow) for efficient estimation of the effective inverse temperature $\beta_{\rm eff}$. The parameter update is performed by gradient descent on the Kullback–Leibler (KL) divergence $D_{\rm KL}(P_{D}||Q_{\beta_{\rm eff}})$, where the gradient is given by the difference between the positive phase and the negative phase (see Sec. \ref{['sec:DetailsOfSAL']}). LSB generates both standard and conditional samples, respectively, from the Boltzmann distribution $B_{\beta_{\rm eff}}(\mathbf{s}|\mathbf{u})$ for the negative phase and the conditional Boltzmann distribution $A_{\beta_{\rm eff}}(\mathbf{h}|\mathbf{v},\mathbf{u})$ (see Sec. \ref{['sec:ConditionalSampling']}) for the positive phase. Here, sampled variables are highlighted in yellow. The negative phase is computed using only standard samples from $B_{\beta_{\rm eff}}(\mathbf{s}|\mathbf{u})$, while the positive phase uses the data distribution $P_{D}$ together with $\beta_{\rm eff}$ estimated via CEM from conditional samples drawn from $A_{\beta_{\rm eff}}(\mathbf{h}|\mathbf{v},\mathbf{u})$. (b) Comparison of learning methods: SAL (LSB + CEM) versus conventional MCMC-based learning. (c-e) Model structures: Each model consists of visible nodes $v_i$ (blue circles, arranged in the lower row) and, if present, hidden nodes $h_j$ (orange circles, arranged in the upper row). Edges represent pairwise interactions: blue lines between visible nodes indicate visible–visible interactions $V_{ij}$, and gray lines between visible and hidden nodes indicate visible–hidden interactions $W_{ij}$. The bias vectors for the visible and hidden nodes are denoted by $\mathbf{b}$ and $\mathbf{c}$, respectively. (c) Fully visible Boltzmann machine (FBM): only visible nodes are present, and all visible nodes are mutually connected. (d) Restricted Boltzmann machine (RBM): both visible and hidden nodes are present; every visible node is connected to every hidden node, but there are no connections among visible nodes or among hidden nodes. (e) Semi-restricted Boltzmann machine (SRBM): both visible and hidden nodes are present; every visible node is connected to every hidden node, and all visible nodes are also mutually connected, but there are no connections among hidden nodes. (f) Examples of applications of BMs trained with SAL: image generation, classification, and reconstruction.
  • Figure 2: Sampling accuracy and effective inverse temperature for LSB and Gibbs sampling: (a) Sampling accuracy $D_{\mathrm{KL}}(P_S||B_{\beta_{\rm eff}})$ for LSB and Gibbs sampling across 10 random instances of SRBM with $N_v=10$ and $N_h=5$. For each instance, $\Delta$ was fixed at 1 and $\sigma$ was optimized from the candidate set $\sigma^{-2}\in\{0.5, 0.6, \dots, 2.0\}$ to maximize LSB sampling accuracy. Horizontal lines and shaded regions indicate the means KL divergence and their standard errors: $0.09 \pm 0.02$ for Gibbs and $0.07 \pm 0.01$ for LSB. (b) The effective inverse temperature $\beta_{\rm eff}$ for the output distribution produced by each sampler. For Gibbs sampling, $\beta_{\rm eff}$ was estimated by KL-divergence minimization. For LSB, $\beta_{\rm eff}$ was estimated by both KL-divergence minimization and CEM.
  • Figure 3: Learning performance on the 3-spin model: The vertical axis shows the cost function $D_{\rm KL}(P_D||Q_\beta)$, and the horizontal axis indicates training epochs. Four models are compared: FBMs, RBMs, and SRBMs trained using SAL; RBMs with $\beta = 1$ trained using CD-100. The number of hidden variables $N_h$ for both RBMs and SRBMs was set to 5. Each point represents the mean over 10 independent instances, and error bars denote the standard error. Lower $D_{\rm KL}(P_D||Q_\beta)$ values indicate closer agreement between $Q_\beta$ and $P_D$.
  • Figure 4: Generative performance on the BAS dataset: (a) 36 randomly selected training samples. (b)–(g) 36 samples generated by the trained SRBM with LSB after 1, 10, 500, 1000, 3000, and 6000 training epochs, respectively.
  • Figure 5: Reconstruction performance on the BAS dataset: (a) 36 randomly selected test samples. (b) The same samples with a central $5 \times 4$ block masked (47.6% of pixels missing, shown in orange). (c)–(e) Reconstructions of (b) using the trained SRBM at 1, 500, and 1000 epochs, respectively, obtained by conditional LSB sampling. (f) The ratio of incorrect pixels per reconstruction as a function of training epoch. Each data point represents the mean over 10 independent runs with different random initializations of $\mathbf{u}$; error bars indicate the standard deviation
  • ...and 9 more figures