Exponential concentration in quantum kernel methods

Supanut Thanasilp; Samson Wang; M. Cerezo; Zoë Holmes

Exponential concentration in quantum kernel methods

Supanut Thanasilp, Samson Wang, M. Cerezo, Zoë Holmes

TL;DR

This work investigates exponential concentration in quantum kernel methods, showing that kernel values can concentrate around a fixed value as the number of qubits grows, which makes polynomial-shot kernel estimation effectively data-independent and harms generalization. The authors develop a unified framework with data-embedding unitaries and two kernels, $ au^{FQ}$ and $ au^{PQ}$, and derive analytic bounds for four concentration mechanisms: expressivity, entanglement, global measurements, and noise; they also analyze training parameterized embeddings via kernel target alignment and reveal conditions under which the training landscape becomes exponentially flat. By combining theory with numerical simulations, they provide guidelines to avoid concentration, such as favoring problem-informed embeddings, restricting entanglement for projected kernels, and acknowledging the detrimental role of hardware noise on near-term devices. The results suggest that achieving a Quantum Advantage with kernel methods requires carefully designed concentration-free embeddings and robust error mitigation, rather than naive application of unstructured, highly expressive quantum encodings. Overall, the paper clarifies when quantum kernels can fail in practice and points toward covariant, structure-aware embeddings as a path to practical quantum kernel methods.

Abstract

Kernel methods in Quantum Machine Learning (QML) have recently gained significant attention as a potential candidate for achieving a quantum advantage in data analysis. Among other attractive properties, when training a kernel-based model one is guaranteed to find the optimal model's parameters due to the convexity of the training landscape. However, this is based on the assumption that the quantum kernel can be efficiently obtained from quantum hardware. In this work we study the performance of quantum kernel models from the perspective of the resources needed to accurately estimate kernel values. We show that, under certain conditions, values of quantum kernels over different input data can be exponentially concentrated (in the number of qubits) towards some fixed value. Thus on training with a polynomial number of measurements, one ends up with a trivial model where the predictions on unseen inputs are independent of the input data. We identify four sources that can lead to concentration including: expressivity of data embedding, global measurements, entanglement and noise. For each source, an associated concentration bound of quantum kernels is analytically derived. Lastly, we show that when dealing with classical data, training a parametrized data embedding with a kernel alignment method is also susceptible to exponential concentration. Our results are verified through numerical simulations for several QML tasks. Altogether, we provide guidelines indicating that certain features should be avoided to ensure the efficient evaluation of quantum kernels and so the performance of quantum kernel methods.

Exponential concentration in quantum kernel methods

TL;DR

and

, and derive analytic bounds for four concentration mechanisms: expressivity, entanglement, global measurements, and noise; they also analyze training parameterized embeddings via kernel target alignment and reveal conditions under which the training landscape becomes exponentially flat. By combining theory with numerical simulations, they provide guidelines to avoid concentration, such as favoring problem-informed embeddings, restricting entanglement for projected kernels, and acknowledging the detrimental role of hardware noise on near-term devices. The results suggest that achieving a Quantum Advantage with kernel methods requires carefully designed concentration-free embeddings and robust error mitigation, rather than naive application of unstructured, highly expressive quantum encodings. Overall, the paper clarifies when quantum kernels can fail in practice and points toward covariant, structure-aware embeddings as a path to practical quantum kernel methods.

Abstract

Paper Structure (47 sections, 14 theorems, 190 equations, 15 figures, 1 table)

This paper contains 47 sections, 14 theorems, 190 equations, 15 figures, 1 table.

Introduction
Results
Framework
Why exponential concentration is problematic
Sources of exponential concentration
Expressivity-induced concentration
Entanglement-induced concentration
Global-measurement-induced concentration
Noise-induced concentration
Training parameterized quantum kernels
Discussion
Data Availability
Code Availability
Acknowledgements
Competing interests
...and 32 more sections

Key Result

Proposition 1

Consider the fidelity quantum kernel as defined in Eq. eq:fidelity-kernel-mt. Assume that the kernel values $\kappa^{\rm FQ}(\boldsymbol{x},\boldsymbol{x'})$ exponentially concentrate towards an exponentially small value as per Definition def:exp-concentration. Supposing an $N \in \mathcal{O}(\oper for some $c > 1$.

Figures (15)

Figure 1: Exponential concentration and its implications on kernel methods: The exponential concentration (in the number of qubits $n$) of quantum kernels $\kappa(\boldsymbol{x},\boldsymbol{x'})$, over all possible input data pairs $\boldsymbol{x},\boldsymbol{x'}$, can be seen to stem from the difficulty of information extraction from data quantum states due to various sources (illustrated in panels (a) and (b)). The kernel concentration has a detrimental impact on the performance of quantum kernel-based methods. As shown in panel (c), for a polynomial (in $n$) number of measurement shots, the statistical estimates of the off-diagonal elements in the Gram matrix $\hat{\kappa}(\boldsymbol{x}_i,\boldsymbol{x}_j)$ contain no information about the input data (with high probability) i.e., each $\hat{\kappa}(\boldsymbol{x}_i,\boldsymbol{x}_j) = \hat{\kappa}_{ij}$. The exact behaviour of the estimated kernel value depends on the measurement strategy: for the Loschmidt Echo test (i.e., the overlap test), $\hat{\kappa}_{ij}$ concentrates to $0$ for $i \neq j$ (corresponding to the estimated Gram matrix being an identity $\mathbb{1}$) and for the SWAP test $\hat{\kappa}_{i,j}$ for $i \neq j$ is indistinguishable from a data-independent random variable (corresponding to the estimated Gram matrix being a random matrix). Ultimately, this leads to a trivial model where the predictions on unseen inputs are independent of the training data.
Figure 2: Schematic of effect of exponential concentration and shot noise on training and generalization performance. For the unseen (test) data, the behavior depends on how kernel values are statistically estimated. In the case of the Loschmidt Echo test, the model predictions are zero with high probability. On using the SWAP test, the model predictions fluctuate around zero (due to shot noise). On the other hand, for the training data, the training labels are effectively hard-coded by the optimization process. (For simplicity we here consider the limit of no regularization.)
Figure 3: Effect of exponential concentration on training and generalization performance. We consider a tensor product encoding for an engineered data set where each component is uniformly drawn from $[0, 2\pi]$ and the true label is $y_{\rm true} (\boldsymbol{x}) = \sum_{i=1}^{N_s} w_{i} \kappa^{\rm FQ}(\boldsymbol{x_i}, \boldsymbol{x})$ where $w_i$ is randomly chosen from $[0,1]$. We train on $N_s = 150$ data points. In the main plot, the loss on a test dataset $\mathcal{S}_{\rm test}$ relative to its initial value (without training) is plotted as a function of increasing training data. In the inset, an absolute training error is plotted as a function of the increasing data. We note that each kernel value is estimated with $N=1000$ and the number testing data points is $20$. The training is done with no regularization $\lambda = 0$. We repeat this experiment $10$ times. The solid curves represent averages of respective losses and the shaded areas represent standard deviations.
Figure 4: Hardware Efficient Embedding (HEE). A layer is composed of single qubit x-rotations where the rotation angle on qubit $k$ is given by the $k_{\rm th}$ component of the input data point $\boldsymbol{x}$. After each layer of rotations, one applies entangling gates acting on adjacent pairs of qubits.
Figure 5: Datasets. (a) An input data point $\boldsymbol{x}$ is obtained from dimensionally reducing an original MNIST image to $n$ features using principal component analysis. We assign label $-1$ if the original image is digit '0' and $1$ if the original image is digit '1'. (b) A hypercube of width $2\pi/2^{1/n}$ is centred at the origin. An input data point $\boldsymbol{x}$ with each of its component bounded between $-\pi$ and $\pi$ has an associated label $y=1$ if the point is inside the hypercube (represented by a circle) and $y=-1$, otherwise (represented by a cross).
...and 10 more figures

Theorems & Definitions (50)

Definition 1: Exponential concentration
Proposition 1
Proposition 2
Corollary 1
Theorem 1: Expressivity-induced concentration
Theorem 2: Entanglement-induced concentration
Corollary 2
Proposition 3: Global-measurement-induced concentration
Theorem 3: Noise-induced concentration
Proposition 4: Concentration of kernel target alignment
...and 40 more

Exponential concentration in quantum kernel methods

TL;DR

Abstract

Exponential concentration in quantum kernel methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (50)