Quantum Kernel Methods under Scrutiny: A Benchmarking Study

Jan Schnabel; Marco Roth

Quantum Kernel Methods under Scrutiny: A Benchmarking Study

Jan Schnabel, Marco Roth

TL;DR

This work presents a comprehensive large-scale study examining QKMs based on fidelity quantum kernels (FQKs) and projected quantum kernels (PQKs) across a manifold of design choices and explores the underlying principles responsible for learning.

Abstract

Since the entry of kernel theory in the field of quantum machine learning, quantum kernel methods (QKMs) have gained increasing attention with regard to both probing promising applications and delivering intriguing research insights. Benchmarking these methods is crucial to gain robust insights and to understand their practical utility. In this work, we present a comprehensive large-scale study examining QKMs based on fidelity quantum kernels (FQKs) and projected quantum kernels (PQKs) across a manifold of design choices. Our investigation encompasses both classification and regression tasks for five dataset families and 64 datasets, systematically comparing the use of FQKs and PQKs quantum support vector machines and kernel ridge regression. This resulted in over 20,000 models that were trained and optimized using a state-of-the-art hyperparameter search to ensure robust and comprehensive insights. We delve into the importance of hyperparameters on model performance scores and support our findings through rigorous correlation analyses. Additionally, we provide an in-depth analysis addressing the design freedom of PQKs and explore the underlying principles responsible for learning. Our goal is not to identify the best-performing model for a specific task but to uncover the mechanisms that lead to effective QKMs and reveal universal patterns.

Quantum Kernel Methods under Scrutiny: A Benchmarking Study

TL;DR

Abstract

Paper Structure (42 sections, 27 equations, 31 figures, 3 tables)

This paper contains 42 sections, 27 equations, 31 figures, 3 tables.

Introduction
Theoretical Background
Conventional Kernel Theory
Quantum Kernel Methods
Study Design
Models
Datasets
Experimental setup and Implementation
Results
Model Performance
Influence of Hyperparameters
Influence of Encoding Circuits
Analysis of PQK Design Options
Discussion
Conclusion
...and 27 more sections

Figures (31)

Figure 1: Schematic illustration of the basic working principle of QKMs and its two most common approaches to compute respective quantum kernel Gram matrices. Data points are mapped from the input space $\mathcal{X}$ to the quantum Hilbert space $\mathcal{H}^Q$ by encoding them into quantum states $\ket{\psi(\mathbf{x},\boldsymbol{\theta})}$. Access to $\mathcal{H}^Q$ is provided by measurements, which can be expressed by inner products of quantum states in full analogy to classical kernel theory. Left: By using the Hilbert-Schmidt inner product and leveraging this fidelity-type metric to define quantum kernels leads to FQKs, cf. Eq. \ref{['eq:Q-Kernel']}. Right: Instead of directly processing quantum states within the quantum Hilbert space it has been shown that it can be beneficial to first project them to an approximate classical representation using, e.g., reduced physical observables. This concept gives rise to the family of PQKs. One of the simplest forms of defining PQKs is given in Eq. \ref{['eq:PQK-general']} and corresponds to measuring $k$-particle reduced density matrices and process the result with a classical kernel function $\kappa$. In both cases (FQK and PQK) the resulting kernel Gram matrices are subsequently passed to a classical kernel algorithm.
Figure 2: Schematic illustration of the scope of this work and the basic functional principle of our software tool QKMTunerschnabel24gitlab used for the hyperparameter search of QKMs. We thoroughly investigate classification and regression tasks of five different dataset families and $64$ datasets using QSVC as well as QSVR and QKRR, respectively leveraging both FQK and PQK approaches for evaluating the corresponding quantum kernel matrix. Corresponding data are embedded using nine data encoding circuits from the literature with up to 15 qubits. The code is based on the QML library sQUlearn kreplin2025squlearn, the (classical) hyperparameter optimization framework Optuna optuna2019 and the (classical) machine learning library scikit-learn scikit-learn.
Figure 3: Average Spearman correlations $\overline{C}$ of all$[0,1]$-normalized features to the outputs to assess the dataset complexity for the datasets considered in this study. The classification datasets depend on variables between $2$ and $20$ that can be seen as controlling the difficulty, while for regression datasets this variable corresponds to the number of features. Note that higher values of this measure indicate simpler problems.
Figure 4: Overview of test performance scores of respective QKMs as a function of increasing dataset complexity. Results within each dataset are aggregated across all data encoding circuits with corresponding optimal $n_{\mathrm{layers}}^*$ yielding minimum/maximum test performance scores for regression/classification, respectively. For comparison, we provide classical KRR/SVR and SVC results each based on a RBF kernel. The upper panel displays two regression tasks, where the MSE is used to measure the prediction accuracy and the number of features controls the dataset complexity. The Friedman dataset family is shown in (a). The QFMNIST dataset family is shown in (b). The lower panel illustrates the two classification tasks of this study, where we use the ROC-AUC score to assess classification accuracy. In (c) we show the two curves diff dataset family with the degree $D$ controlling the complexity. The hidden manifold diff family is given in (d) with the manifold dimension $m$ as respective control parameter.
Figure 5: Comparison of hyperparameter importances for optimizing the five-fold cross-validation score in the corresponding hyperparameter searches for the regression tasks of this study. We chose half the interquartile range to define the whisker length. The Friedman dataset with $d=5$ features is shown in (a), the QFMNIST results with $d=5$ components are illustrated in (b), and the NH3-PES data are given in (c). The results for each model and dataset are aggregated over different encoding circuits in each case. Here, we always impose that $n_{\text{qubits}}$ can only be integer multiples of the number of features present in the respective dataset, with a maximum of $n_{\text{qubits}}^{\text{max}}=15$.
...and 26 more figures

Quantum Kernel Methods under Scrutiny: A Benchmarking Study

TL;DR

Abstract

Quantum Kernel Methods under Scrutiny: A Benchmarking Study

Authors

TL;DR

Abstract

Table of Contents

Figures (31)