Table of Contents
Fetching ...

Diabatic quantum annealing for training energy-based generative models

Gilhan Kim, Ju-Yeon Gyhm, Daniel K. Park

TL;DR

RBMs require unbiased Boltzmann samples, but classical sampling is slow and produces correlated data. The authors apply diabatic quantum annealing (DQA) and its analytic relation between annealing schedules and the effective inverse temperature $\beta_{\mathrm{integral}}$ to generate calibrated Boltzmann samples for RBM training, enabling principled sampling without post hoc fitting. On a D-Wave device, DQA-based RBM training achieves faster convergence and lower validation error than persistent CD, while exposing a hardware-induced temperature misalignment that is corrected by an analytic rescaling factor $\alpha$. This calibration improves sampling fidelity and demonstrates the practicality and scalability potential of quantum-assisted Boltzmann sampling for energy-based models, with extensions to fully connected Boltzmann machines and gate-based implementations discussed for the future.

Abstract

Energy-based generative models, such as restricted Boltzmann machines (RBMs), require unbiased Boltzmann samples for effective training. Classical Markov chain Monte Carlo methods, however, converge slowly and yield correlated samples, making large-scale training difficult. We address this bottleneck by applying the analytic relation between annealing schedules and effective inverse temperature in diabatic quantum annealing. By implementing this prescription on a quantum annealer, we obtain temperature-controlled Boltzmann samples that enable RBM training with faster convergence and lower validation error than classical sampling. We further identify a systematic temperature misalignment intrinsic to analog quantum computers and propose an analytical rescaling method that mitigates this hardware noise, thereby enhancing the practicality of quantum annealers as Boltzmann samplers. In our method, the model's connectivity is set directly by the qubit connectivity, transforming the computational complexity inherent in classical sampling into a requirement on quantum hardware. This shift allows the approach to extend naturally from RBMs to fully connected Boltzmann machines, opening opportunities inaccessible to classical training methods.

Diabatic quantum annealing for training energy-based generative models

TL;DR

RBMs require unbiased Boltzmann samples, but classical sampling is slow and produces correlated data. The authors apply diabatic quantum annealing (DQA) and its analytic relation between annealing schedules and the effective inverse temperature to generate calibrated Boltzmann samples for RBM training, enabling principled sampling without post hoc fitting. On a D-Wave device, DQA-based RBM training achieves faster convergence and lower validation error than persistent CD, while exposing a hardware-induced temperature misalignment that is corrected by an analytic rescaling factor . This calibration improves sampling fidelity and demonstrates the practicality and scalability potential of quantum-assisted Boltzmann sampling for energy-based models, with extensions to fully connected Boltzmann machines and gate-based implementations discussed for the future.

Abstract

Energy-based generative models, such as restricted Boltzmann machines (RBMs), require unbiased Boltzmann samples for effective training. Classical Markov chain Monte Carlo methods, however, converge slowly and yield correlated samples, making large-scale training difficult. We address this bottleneck by applying the analytic relation between annealing schedules and effective inverse temperature in diabatic quantum annealing. By implementing this prescription on a quantum annealer, we obtain temperature-controlled Boltzmann samples that enable RBM training with faster convergence and lower validation error than classical sampling. We further identify a systematic temperature misalignment intrinsic to analog quantum computers and propose an analytical rescaling method that mitigates this hardware noise, thereby enhancing the practicality of quantum annealers as Boltzmann samplers. In our method, the model's connectivity is set directly by the qubit connectivity, transforming the computational complexity inherent in classical sampling into a requirement on quantum hardware. This shift allows the approach to extend naturally from RBMs to fully connected Boltzmann machines, opening opportunities inaccessible to classical training methods.

Paper Structure

This paper contains 10 sections, 17 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Schematic comparison of RBM training using classical persistent contrastive divergence (PCD) and diabatic quantum annealing (DQA)–based sampling. In both cases, the same RBM architecture and training procedure are used; the only difference lies in the sampling method employed to estimate model expectations during learning. The RBM consists of 784 visible units and 1200 hidden units, with each node connected on average to 18.17 others (standard deviation 2.37). The right panels show representative samples generated from the trained RBM (left: MNIST, right: Fashion-MNIST), illustrating the quality of the learned generative model under each sampling method.
  • Figure 2: Validation errors during RBM training on (a) MNIST and (b) Fashion-MNIST. Curves show mean Hamming reconstruction error over 10 independently trained RBMs, with error bars indicating one standard deviation. The red curves correspond to classical PCD training, while the green and blue curves show DQA-based training before and after applying the parameter-rescaling method that corrects for temperature misalignment, respectively. The rescaled DQA samples yield consistently lower reconstruction error, demonstrating the importance of correcting hardware-induced temperature misalignment.
  • Figure 3: Inverse temperature obtained from unitary simulation (red circles), from the integral expression in Eq. (\ref{['eq:DQAintegral']}) (blue squares), and empirically estimated from the D-Wave device (green triangles). The rescaling factor $\alpha$ (yellow diamonds) is defined in Eq. (\ref{['eq:alpha']}). The purple curve shows a discretized, Trotterized simulation of the annealing dynamics performed using Qiskit. While $\beta_{\mathrm{dwave}}$ deviates from the theoretical predictions, $\alpha$ remains within the range $5$–$7$ over the annealing-time window examined.
  • Figure 4: Minimum validation error as a function of the hidden-layer size $N_H$ for MNIST and Fashion-MNIST. Mean minimum validation error across 10 independent runs is shown for PCD and DQA. Dashed lines indicate fits to the exponential form $y = a e^{-b N_H} + c$. While both methods exhibit exponential decay with increasing model capacity, DQA consistently attains a larger decay rate $b$, indicating better scalability.
  • Figure 5: Relative improvement in the minimum validation error as a function of the hidden layer size $N_H$, defined as $(\mathrm{PCD} - \mathrm{DQA})/\mathrm{PCD}$, for MNIST and Fashion-MNIST. The increasing ratio with system size indicates that the performance advantage of DQA over PCD becomes more pronounced as the model dimension grows.
  • ...and 4 more figures