Table of Contents
Fetching ...

Supervised quantum machine learning models are kernel methods

Maria Schuld

TL;DR

The paper reframes supervised quantum learning as kernel methods by treating data-encoding quantum states as density matrices, defining a quantum kernel κ(x,x') = tr[ρ(x) ρ(x')], and showing quantum models are linear in this feature space. It establishes the equivalence between quantum models and the RKHS of the quantum kernel, proves a representer theorem for optimal measurements, and demonstrates that training reduces to a finite-dimensional convex problem over the training data. It also surveys data-encoding strategies, their induced kernels (including a Fourier-series representation), and the regularisation effects embedded in the kernel. The results suggest kernel-based quantum training can outperform variational approaches in finding optimal measurements and highlight the central role of data encoding in shaping quantum learning performance, with implications for near-term and fault-tolerant quantum devices.

Abstract

With near-term quantum devices available and the race for fault-tolerant quantum computers in full swing, researchers became interested in the question of what happens if we replace a supervised machine learning model with a quantum circuit. While such "quantum models" are sometimes called "quantum neural networks", it has been repeatedly noted that their mathematical structure is actually much more closely related to kernel methods: they analyse data in high-dimensional Hilbert spaces to which we only have access through inner products revealed by measurements. This technical manuscript summarises and extends the idea of systematically rephrasing supervised quantum models as a kernel method. With this, a lot of near-term and fault-tolerant quantum models can be replaced by a general support vector machine whose kernel computes distances between data-encoding quantum states. Kernel-based training is then guaranteed to find better or equally good quantum models than variational circuit training. Overall, the kernel perspective of quantum machine learning tells us that the way that data is encoded into quantum states is the main ingredient that can potentially set quantum models apart from classical machine learning models.

Supervised quantum machine learning models are kernel methods

TL;DR

The paper reframes supervised quantum learning as kernel methods by treating data-encoding quantum states as density matrices, defining a quantum kernel κ(x,x') = tr[ρ(x) ρ(x')], and showing quantum models are linear in this feature space. It establishes the equivalence between quantum models and the RKHS of the quantum kernel, proves a representer theorem for optimal measurements, and demonstrates that training reduces to a finite-dimensional convex problem over the training data. It also surveys data-encoding strategies, their induced kernels (including a Fourier-series representation), and the regularisation effects embedded in the kernel. The results suggest kernel-based quantum training can outperform variational approaches in finding optimal measurements and highlight the central role of data encoding in shaping quantum learning performance, with implications for near-term and fault-tolerant quantum devices.

Abstract

With near-term quantum devices available and the race for fault-tolerant quantum computers in full swing, researchers became interested in the question of what happens if we replace a supervised machine learning model with a quantum circuit. While such "quantum models" are sometimes called "quantum neural networks", it has been repeatedly noted that their mathematical structure is actually much more closely related to kernel methods: they analyse data in high-dimensional Hilbert spaces to which we only have access through inner products revealed by measurements. This technical manuscript summarises and extends the idea of systematically rephrasing supervised quantum models as a kernel method. With this, a lot of near-term and fault-tolerant quantum models can be replaced by a general support vector machine whose kernel computes distances between data-encoding quantum states. Kernel-based training is then guaranteed to find better or equally good quantum models than variational circuit training. Overall, the kernel perspective of quantum machine learning tells us that the way that data is encoded into quantum states is the main ingredient that can potentially set quantum models apart from classical machine learning models.

Paper Structure

This paper contains 20 sections, 9 theorems, 80 equations, 9 figures, 1 table.

Key Result

Theorem 1

Let $\mathcal{X} = \mathbb{R}^N$ and $S(\mathbf{x})$ be a quantum circuit that encodes the data inputs $\mathbf{x} = (x_1, \dots, x_N) \in \mathcal{X}$ into a $n$-qubit quantum state $S(\mathbf{x}) \left| 0 \right \rangle = \left| \phi(\mathbf{x}) \right \rangle$ via gates of the form $e^{-i x_i G } The quantum kernel $\kappa(\mathbf{x}, \mathbf{x}')$ can be written as where $\Omega \subseteq \ma

Figures (9)

  • Figure 1: Quantum computing and kernel methods are based on a similar principle. Both have mathematical frameworks in which information is mapped into and then processed in high-dimensional spaces to which we have only limited access. In kernel methods, the access to the feature space is facilitated through kernels or inner products of feature vectors. In quantum computing, access to the Hilbert space of quantum states is given by measurements, which can also be expressed by inner products of quantum states.
  • Figure 2: Interpreting a quantum circuit as a machine learning model. After encoding the data with the routine $S_x$, a quantum circuit "processes" the embedded input, followed by a measurement (left). The processing circuit may depend on classically trainable parameters, as investigated in near-term quantum machine learning with variational circuits, or it may consist of standard quantum routines such as amplitude amplification or quantum Fourier transforms. The expected outcome of the measurement $\mathcal{M}$ is interpreted as the model's prediction, which is deterministic (generative models, which would consider the measurement samples as outputs, are not considered here). Since the processing circuit only changes the basis in which the measurement is taken, it can conceptually be understood as part of the measurement procedure (right). In this sense, quantum models consist of two parts, the data encoding/embedding and the measurement. Training a quantum model is the problem of finding the measurement that minimises a data-dependent cost function. Note that while the measurement could depend on trainable parameters I will not consider trainable embedding circuits here.
  • Figure 3: Quantum models as linear models in a feature space. A quantum model can be understood as a model that maps data into a feature space in which the measurement defines a linear decision boundary. This feature space is not identical to the Hilbert space of the quantum system. Instead we can define it as the space of complex matrices enriched with the Hilbert-Schmidt inner product -- which is the space where density matrices live in.
  • Figure 4: Overview of the link between quantum models and kernel methods. The strategy with which data is encoded into quantum states is a feature map from the space of data to the feature space $\mathcal{F}$ "of density matrices" $\rho$. In this space, quantum models can be expressed as a linear model whose decision boundary is defined by the measurement. According to kernel theory, an alternative feature space with the same kernel is the RKHS $F$, whose vectors are functions arising from fixing one entry of the kernel (i.e., the inner product of data-encoding density matrices). The RKHS is equivalent to the space of quantum models, which are linear models in the data-encoding feature space. These connections can be used to study the properties of quantum models as learners, which turn out to be largely determined by the kernel, and therefore by the data-encoding strategy.
  • Figure 5: Kernel-based training vs. variational training. Training a quantum model as defined here tries to find the optimal measurement $\mathcal{M}_{\rm opt}$ over all possible quantum measurements. Kernel theory guarantees that in most cases this optimal measurement will have a representation that is a linear combination in the training data with coefficients $\alpha = (\alpha_1,\dots, \alpha_M)$. Kernel-based training therefore optimises over the parameters $\alpha$ directly, effectively searching for the best model in an $M$-dimensional subspace spanned by the training data (blue). We are guaranteed that $\mathcal{M}_{\alpha}^{\rm opt} = \mathcal{M}_{\rm opt}$, and if the loss is convex this is the only minimum, which means that kernel-based training will find the best measurement out of all measurements. Variational training parametrises the measurement instead by a general ansatz that depends on $K$ parameters $\theta = (\theta_1, \dots, \theta_K)$, and tries to find the optimal measurement $\mathcal{M}_{\theta}^{\rm opt}$ in the subspace explored by the ansatz. This $\theta$-subspace is not guaranteed to contain the globally optimal measurement $\mathcal{M}_{\rm opt}$, and optimisation is usually non-convex. We are therefore guaranteed that kernel-based training finds better or the same minima to variational training, but at the expense of having to compute pairwise distances of data points for training and classification.
  • ...and 4 more figures

Theorems & Definitions (24)

  • Definition 1: Data-encoding feature map
  • Definition 2: Quantum kernel
  • Example 3.1
  • Example 3.2
  • Theorem 1: Fourier representation of the quantum kernel
  • Corollary 1.1: Fourier series representation of the quantum kernel
  • Definition 3: Quantum model
  • Example 5.1
  • Definition 4: Linear model
  • Theorem 2: Quantum models are linear models in data-encoding feature space
  • ...and 14 more