Table of Contents
Fetching ...

The Inductive Bias of Quantum Kernels

Jonas M. Kübler, Simon Buchholz, Bernhard Schölkopf

TL;DR

The paper analyzes when quantum kernel methods can outperform classical approaches by examining the inductive bias encoded in the kernel's spectrum. By modeling data embeddings into quantum density matrices, it shows that generalization is feasible only when the RKHS remains effectively low-dimensional or when a problem-specific bias is applied via biased (projected) kernels; otherwise the kernel's expressivity harms generalization and measuring the kernel becomes expensive. The authors prove bounds on the largest kernel eigenvalue, propose biased kernel constructions based on reduced density matrices, and demonstrate with experiments that the right bias enables learning from limited data while the wrong bias fails. They conclude that quantum advantages are plausible primarily when the data-generating process is naturally quantum or when a bias is encoded that is hard to replicate classically, implying limited prospects for quantum speedups on typical classical datasets.

Abstract

It has been hypothesized that quantum computers may lend themselves well to applications in machine learning. In the present work, we analyze function classes defined via quantum kernels. Quantum computers offer the possibility to efficiently compute inner products of exponentially large density operators that are classically hard to compute. However, having an exponentially large feature space renders the problem of generalization hard. Furthermore, being able to evaluate inner products in high dimensional spaces efficiently by itself does not guarantee a quantum advantage, as already classically tractable kernels can correspond to high- or infinite-dimensional reproducing kernel Hilbert spaces (RKHS). We analyze the spectral properties of quantum kernels and find that we can expect an advantage if their RKHS is low dimensional and contains functions that are hard to compute classically. If the target function is known to lie in this class, this implies a quantum advantage, as the quantum computer can encode this inductive bias, whereas there is no classically efficient way to constrain the function class in the same way. However, we show that finding suitable quantum kernels is not easy because the kernel evaluation might require exponentially many measurements. In conclusion, our message is a somewhat sobering one: we conjecture that quantum machine learning models can offer speed-ups only if we manage to encode knowledge about the problem at hand into quantum circuits, while encoding the same bias into a classical model would be hard. These situations may plausibly occur when learning on data generated by a quantum process, however, they appear to be harder to come by for classical datasets.

The Inductive Bias of Quantum Kernels

TL;DR

The paper analyzes when quantum kernel methods can outperform classical approaches by examining the inductive bias encoded in the kernel's spectrum. By modeling data embeddings into quantum density matrices, it shows that generalization is feasible only when the RKHS remains effectively low-dimensional or when a problem-specific bias is applied via biased (projected) kernels; otherwise the kernel's expressivity harms generalization and measuring the kernel becomes expensive. The authors prove bounds on the largest kernel eigenvalue, propose biased kernel constructions based on reduced density matrices, and demonstrate with experiments that the right bias enables learning from limited data while the wrong bias fails. They conclude that quantum advantages are plausible primarily when the data-generating process is naturally quantum or when a bias is encoded that is hard to replicate classically, implying limited prospects for quantum speedups on typical classical datasets.

Abstract

It has been hypothesized that quantum computers may lend themselves well to applications in machine learning. In the present work, we analyze function classes defined via quantum kernels. Quantum computers offer the possibility to efficiently compute inner products of exponentially large density operators that are classically hard to compute. However, having an exponentially large feature space renders the problem of generalization hard. Furthermore, being able to evaluate inner products in high dimensional spaces efficiently by itself does not guarantee a quantum advantage, as already classically tractable kernels can correspond to high- or infinite-dimensional reproducing kernel Hilbert spaces (RKHS). We analyze the spectral properties of quantum kernels and find that we can expect an advantage if their RKHS is low dimensional and contains functions that are hard to compute classically. If the target function is known to lie in this class, this implies a quantum advantage, as the quantum computer can encode this inductive bias, whereas there is no classically efficient way to constrain the function class in the same way. However, we show that finding suitable quantum kernels is not easy because the kernel evaluation might require exponentially many measurements. In conclusion, our message is a somewhat sobering one: we conjecture that quantum machine learning models can offer speed-ups only if we manage to encode knowledge about the problem at hand into quantum circuits, while encoding the same bias into a classical model would be hard. These situations may plausibly occur when learning on data generated by a quantum process, however, they appear to be harder to come by for classical datasets.

Paper Structure

This paper contains 30 sections, 6 theorems, 78 equations, 5 figures, 1 table.

Key Result

Lemma 1

The largest eigenvalue $\gamma_{max}$ of $K$ satisfies the bound $\gamma_{max} \leq \sqrt{\text{Tr}\left[\rho_\mu^2 \right]}$.

Figures (5)

  • Figure 1: Quantum advantage via inductive bias: (a) Data generating quantum circuit $f(x) = \text{Tr}\left[\rho^V(x)(M \otimes \mathrm{id}) \right] =\text{Tr}\left[\tilde{\rho}^V(x)M \right]$. (b) The full quantum kernel $k(x,x') = \text{Tr}\left[\rho^V(x)\rho^V(x') \right]$ is too general and cannot learn $f$ efficiently. (c) The biased quantum kernel $q(x,x') = \text{Tr}\left[\tilde{\rho}^V(x)\tilde{\rho}^V(x') \right]$ meaningfully constrains the function space and allows to learn $f$ with little data.
  • Figure 2: Left: Spectral behavior of biased kernel $q$, see Theorem \ref{['thm:biased_kernels']}b) and Equation \ref{['eq:spectral_dec']}Right: The biased kernel $q$, equipped with prior knowledge, easily learns the function for arbitrary number of qubits and achieves optimal mean squared error (MSE). Models that are ignorant to the structure of $f^*$ fail to learn the function. The classical kernel $k_\text{rbf}$ and the full quantum kernel overfit (they have low training error, but large test error). The biased kernel on the wrong qubit $q_w$ has litle capacity with the wrong bias and thus underfits (training and test error essentially overlap).
  • Figure 3: Histogram of the kernel target alignment over 50 runs (left) and task model alignment (right) for $d=7$.
  • Figure 4: Similar as in Fig. \ref{['fig:spectrum_generalization']}. However, for the full quantum kernel $k$ and the rbf kernel, we compute train and test loss over multiple choices of the regularization parameter. For each number of qubits, we only report the loss of the method that achieved smallest test loss. Note that, although this is invalid to asses the power of the full and rbf kernel, it shows, that the poor performance is not due to the choice of regularization. Since we cherry-pick on the test loss, it can happen that an underfitting regularization has the best test loss, which explains the outlier on $k$ at $d=6$.
  • Figure 5: Kernel Target Alignment for $d=1,3,5,7$.

Theorems & Definitions (14)

  • Example 1: Trivial Quantum Advantage
  • Definition 1: Quantum Kernel havlivcek2019Schuld2019QKernelschuld2021quantum
  • Example 2
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Lemma 2: Lemma \ref{['le:largest_ev']} in the main part
  • proof
  • Lemma 3
  • proof
  • ...and 4 more