Table of Contents
Fetching ...

On the Sample Complexity of Quantum Boltzmann Machine Learning

Luuk Coopmans, Marcello Benedetti

TL;DR

The authors show that QBMs can be trained sample efficiently and that the sample complexity can be further reduced with pre-training strategies, including pre-training strategies based on mean-field, Gaussian Fermionic, and geometrically local Hamiltonians.

Abstract

Quantum Boltzmann machines (QBMs) are machine-learning models for both classical and quantum data. We give an operational definition of QBM learning in terms of the difference in expectation values between the model and target, taking into account the polynomial size of the data set. By using the relative entropy as a loss function this problem can be solved without encountering barren plateaus. We prove that a solution can be obtained with stochastic gradient descent using at most a polynomial number of Gibbs states. We also prove that pre-training on a subset of the QBM parameters can only lower the sample complexity bounds. In particular, we give pre-training strategies based on mean-field, Gaussian Fermionic, and geometrically local Hamiltonians. We verify these models and our theoretical findings numerically on a quantum and a classical data set. Our results establish that QBMs are promising machine learning models.

On the Sample Complexity of Quantum Boltzmann Machine Learning

TL;DR

The authors show that QBMs can be trained sample efficiently and that the sample complexity can be further reduced with pre-training strategies, including pre-training strategies based on mean-field, Gaussian Fermionic, and geometrically local Hamiltonians.

Abstract

Quantum Boltzmann machines (QBMs) are machine-learning models for both classical and quantum data. We give an operational definition of QBM learning in terms of the difference in expectation values between the model and target, taking into account the polynomial size of the data set. By using the relative entropy as a loss function this problem can be solved without encountering barren plateaus. We prove that a solution can be obtained with stochastic gradient descent using at most a polynomial number of Gibbs states. We also prove that pre-training on a subset of the QBM parameters can only lower the sample complexity bounds. In particular, we give pre-training strategies based on mean-field, Gaussian Fermionic, and geometrically local Hamiltonians. We verify these models and our theoretical findings numerically on a quantum and a classical data set. Our results establish that QBMs are promising machine learning models.
Paper Structure (32 sections, 17 theorems, 87 equations, 5 figures)

This paper contains 32 sections, 17 theorems, 87 equations, 5 figures.

Key Result

Theorem 1

Given a QBM defined by a set of $n$-qubit Pauli operators $\{H_i\}_{i=1}^m$, a precision $\kappa$ for the QBM expectations, a precision $\xi$ for the data expectations, and a target precision $\epsilon$ such that $\kappa^2 + \xi^2 \geq \frac{\epsilon}{2m}$. After iterations of stochastic gradient descent on the relative entropy $S(\eta \|\rho_\theta)$ with constant learning rate $\gamma^t=\frac{\

Figures (5)

  • Figure 1: Summary of the results. The quantum Boltzmann machine (QBM) learning algorithm takes as input a data set of size polynomial in the number of features/qubits, and an ansatz with parameters $\theta$ and a set of $m$ Hermitian operators $\{H_i\}$. In Definition \ref{['def:problem']} we provide an operational definition of the QBM learning problem where the model and target expectations must be close to within polynomial precision $\epsilon$. A solution $\theta^\mathrm{opt}$ is guaranteed to exist by Jaynes' principle. With Theorems \ref{['thm:training']} and \ref{['thm:training_alpha']} we establish that QBM learning can be solved by minimizing the quantum relative entropy $S(\eta \| \rho_\theta)$ with respect to $\theta$ using stochastic gradient descent (SGD). This requires a polynomial number of iterations $T$, each using a polynomial number of Gibbs state preparations, i.e., the sample complexity is polynomial. With Theorem \ref{['thm:pretraining']} we prove that pre-training strategies that optimize a subset $\theta^\mathrm{pre}$ of the QBM parameters are guaranteed to lower the initial quantum relative entropy. After training the QBM can be used to generate new synthetic data. Icons by https://www.svgrepo.com/, used under https://creativecommons.org/publicdomain/zero/1.0/ and adapted for our purpose.
  • Figure 1: Minimum eigenvalue of the Hessian, as a function of the number of qubits. We show the median of $25$ random instances for the 1D nearest-neighbor Hamiltonian (a), and fully-connected Hamiltonian (b). The scale parameter $\mu$ determines the maximum size of random parameters. We observe that in all cases the smallest eigenvalues shrink with the number of qubits, but appear to plateau to a positive value.
  • Figure 2: Pre-training and training quantum Boltzmann machines. (a) Quantum relative entropy $S(\eta\lvert\rho_{\theta^\mathrm{pre}})$ obtained after various pre-training strategies. We compare a mean-field (MF) model, a one-dimensional and two-dimensional geometrically local (GL) model, and a Gaussian Fermionic (GF) model to no pre-training/maximally mixed state. For the GL models, we stop the pre-training after the pre-training gradient is smaller than $0.01$. We consider an $8$-qubit target $\eta$ as the Gibbs state $e^{\mathcal{H}_{\mathrm{XXZ}}}/Z$ of a one-dimensional XXZ model (Quantum Data), and a target $\eta$ which coherently encodes the binary salamander retina data set (Classical Data). (b) Quantum relative entropy versus number of iterations for Quantum Data. The $t < 0$ iterations (gray area) show the reduction in relative entropy for GL 2D pre-training (red line). The $t=0$ iteration corresponds to the pre-training results in panel (a). The $t>0$ iterations show the training results in the absence of gradient noise, i.e., $\kappa=\xi=0$.
  • Figure 2: (a) Number of SGD iterations required to solve the QBM learning problem for precision $\epsilon=0.1$ versus system size. The learning rate from Theorem \ref{['appthm:training']} is compared to a constant learning rate. The inset shows the log-log plot of the data in the main panel. (b) Number of SGD iterations to solve the QBM learning problem for system size $n=6$ versus the target precision $\epsilon^{-1}$. We compare a fully-connected QBM (blue and orange lines) to a 1D nearest-neighbor QBM (green line).
  • Figure 3: The maximum error in the expectation values versus the number of iterations of stochastic gradient descent (SGD). The target $\eta$ is made of $8$ features from the classical salamander retina data set, the model $\rho_{\theta^t}$ is a quantum Boltzmann machine (QBM). When computing expectation values for the gradient, we allow precision $\xi$ for the target and precision $\kappa$ for the QBM. We compare the combined noise strength of $0.01$ (blue line) to $0.05$ (orange line). We aim for a maximum error of $\epsilon=0.1$ (red dashed line) and use a learning rate of $\gamma = \frac{\epsilon}{4m^2(\kappa^2+\xi^2)}$. SGD converges within a number of iterations consistent with Theorem \ref{['thm:training']}.

Theorems & Definitions (28)

  • Definition 1: QBM learning problem
  • Theorem 1: QBM training
  • Theorem 2: $\alpha$-strongly convex QBM training
  • Theorem 3: QBM pre-training
  • Theorem 4: Theorem 1.1 in Aaronson_2007
  • Definition 2: Convexity
  • Lemma 1
  • Definition 3: $\alpha$-Polyak-Łojasiewicz
  • Definition 4: $\alpha$-strong convexity
  • Lemma 2
  • ...and 18 more