Table of Contents
Fetching ...

Quantum Gaussian Process Regression for Bayesian Optimization

Frederic Rapp, Marco Roth

TL;DR

This paper introduces quantum Gaussian process regression (QGPR) using quantum kernels derived from parameterized quantum circuits to enable uncertainty quantification in regression and Bayesian optimization. A Gram-matrix regularization strategy preserves the GP variance despite finite-sample and hardware noise, with kernel learning driven by marginal log-likelihood. The authors demonstrate QGPR as a surrogate model for Bayesian optimization (QBO) on both synthetic one-dimensional regression and multidimensional hyperparameter optimization tasks, achieving performance comparable to classical BO in simulations and on real quantum hardware. They discuss design choices for quantum feature maps, potential fully quantum GP implementations, and future directions toward quantum advantages in optimization problems with expensive evaluations or quantum data.

Abstract

Gaussian process regression is a well-established Bayesian machine learning method. We propose a new approach to Gaussian process regression using quantum kernels based on parameterized quantum circuits. By employing a hardware-efficient feature map and careful regularization of the Gram matrix, we demonstrate that the variance information of the resulting quantum Gaussian process can be preserved. We also show that quantum Gaussian processes can be used as a surrogate model for Bayesian optimization, a task that critically relies on the variance of the surrogate model. To demonstrate the performance of this quantum Bayesian optimization algorithm, we apply it to the hyperparameter optimization of a machine learning model which performs regression on a real-world dataset. We benchmark the quantum Bayesian optimization against its classical counterpart and show that quantum version can match its performance.

Quantum Gaussian Process Regression for Bayesian Optimization

TL;DR

This paper introduces quantum Gaussian process regression (QGPR) using quantum kernels derived from parameterized quantum circuits to enable uncertainty quantification in regression and Bayesian optimization. A Gram-matrix regularization strategy preserves the GP variance despite finite-sample and hardware noise, with kernel learning driven by marginal log-likelihood. The authors demonstrate QGPR as a surrogate model for Bayesian optimization (QBO) on both synthetic one-dimensional regression and multidimensional hyperparameter optimization tasks, achieving performance comparable to classical BO in simulations and on real quantum hardware. They discuss design choices for quantum feature maps, potential fully quantum GP implementations, and future directions toward quantum advantages in optimization problems with expensive evaluations or quantum data.

Abstract

Gaussian process regression is a well-established Bayesian machine learning method. We propose a new approach to Gaussian process regression using quantum kernels based on parameterized quantum circuits. By employing a hardware-efficient feature map and careful regularization of the Gram matrix, we demonstrate that the variance information of the resulting quantum Gaussian process can be preserved. We also show that quantum Gaussian processes can be used as a surrogate model for Bayesian optimization, a task that critically relies on the variance of the surrogate model. To demonstrate the performance of this quantum Bayesian optimization algorithm, we apply it to the hyperparameter optimization of a machine learning model which performs regression on a real-world dataset. We benchmark the quantum Bayesian optimization against its classical counterpart and show that quantum version can match its performance.
Paper Structure (9 sections, 13 equations, 7 figures, 2 tables)

This paper contains 9 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Conceptual layout of the workflow used in this work. (a) The QGP model is constructed by calculating a quantum kernel and substituting the corresponding Gram matrix as covariance matrix into a classical GP. If the feature map used for the quantum kernel contains variational parameters, they can be optimized using maximum likelihood estimation [Eq. \ref{['eq: NLL']}]. (b) By using a QGP model as a surrogate model for Bayesian optimization, a QBO can be obtained. (c) In Sec. \ref{['sec: qbo_results']}, the QBO algorithm is used to optimize the hyperparameters $\xi$ of a gradient boosting model $h(\boldsymbol{x},\xi)$ which performs regression on a dataset for remaining value estimation of industrial machines.
  • Figure 2: Example of the hardware efficient feature map with $q=4$ qubits and $l=1$ layers, inspired by a Chebychev quantum feature map design PhysRevA.103.052416. The trainable parameters are denoted by $\theta_i$ and the data points by $x$. For the results in this work, various values of $q$ and $l$ are used.
  • Figure 3: QGP regression on a dataset created using Eq. \ref{['eq: sin']} (black line). The results are obtained using the feature map in Fig. \ref{['fig: cheb_map']} with $q=4$ qubits and $l=2$ layers for the encoding, $n_{\text{training}} = 23$ training points, shown as the blue crosses. The test points are marked by the red dots. The posterior mean of the QGP is shown as the red-line and the standard deviation as the shaded area. (a) shows the result of the statevector simulation with optimized parameters, obtaining an $R^2$ score of $0.996$ and an $\text{MSE} = 0.022$. (b) shows the result of the sample-based simulation. We use the optimal parameters obtained in the previous ideal run, resulting in an $R^2$ score of $0.996$ and an $\text{MSE} = 0.024$. (c) shows the result of the real hardware run, using the ibmq_montreal backend, leading to an $R^2$ score of $0.978$ and an $\text{MSE} = 0.114$. All runs use the same parameters.
  • Figure 4: Convergence plot of the log-likelihood loss function [cf. Eq. \ref{['eq: NLL']}], the loss is entirely evaluated on the training data. The variable parameter of the optimization are the angles $\boldsymbol{\theta}$ in the feature map.
  • Figure 5: BO results averaged over independent runs with the mean shown as solid lines and the variance as shades. The expected improvement [Eq. \ref{['eq: EI']}] is used as acquisition function with an exploration-exploitation parameter of $\lambda = 0.1$ The classical BO uses a GP surrogate model with an optimized RBF kernel (black line). The QBO results are obtained with the feature map in Fig. \ref{['fig: cheb_map']} using statevector (red line) and sample-based simulations (blue line). The the initial samples for each individual run are the same for the quantum and classical QBO for better comparison. At each iteration, only the best current result is shown. (a) shows the result for the minimization Eq. \ref{['eq: branin']} where the parameters are fixed at $a=1\text{, }b=5.1/(4\pi^2)\text{, }c = 5/\pi \text{, }r=6\text{, }s=10 \text{ and } t=1/8\pi$. The feature map for the QBO uses $q=4$ qubits and $l=2$ layers. The results are averaged over $25$ runs. (b) shows the result of the hyperparameter optimization of a gradient boosting model on a industrial dataset. The average result of ten iterations of random search runs is shown (green, solid). The kernel is calculated using $q=10$ qubits and $l=2$ layers.
  • ...and 2 more figures