Table of Contents
Fetching ...

Benchmarking Quantum Kernels Across Diverse and Complex Data

Yuhan Jiang, Matthew Otten

TL;DR

This work tackles the practical viability of quantum kernels on high-dimensional, real-world data by introducing a resource-efficient variational quantum kernel framework that uses two encoding schemes and a trainable ansatz with a parameter-scaling technique. It systematically benchmarks eight diverse datasets across tabular, image, time series, and graph domains, showing that correctly designed quantum kernels can surpass standard classical kernels in many cases within constrained quantum resources. The study demonstrates the benefits of amplitude- and truncated-RBF encodings, and validates that scaling the ansatz accelerates convergence and improves final accuracy, with additional gains observed as qubit resources increase. While conducted in classical simulation, the results provide a solid foundation for quantum-kernel methods in real-world ML pipelines and point to future hardware-based evaluations and more quantum-native feature-map designs.

Abstract

Quantum kernel methods are a promising branch of quantum machine learning, yet their practical advantage on diverse, high-dimensional, real-world data remains unverified. Current research has largely been limited to low-dimensional or synthetic datasets, preventing a thorough evaluation of their potential. To address this gap, we developed a variational quantum kernel framework utilizing resource-efficient ansätze for complex classification tasks and introduced a parameter scaling technique to accelerate convergence. We conducted a comprehensive benchmark of this framework on eight challenging, real world and high-dimensional datasets covering tabular, image, time series, and graph data. Our classically simulated results show that the proposed quantum kernel demonstrated a clear performance advantage over standard classical kernels, such as the radial basis function (RBF) kernel. This work demonstrates that properly designed quantum kernels can function as versatile, high-performance tools, laying a foundation for quantum-enhanced applications in real-world machine learning. Further research is needed to fully assess the practical quantum advantage.

Benchmarking Quantum Kernels Across Diverse and Complex Data

TL;DR

This work tackles the practical viability of quantum kernels on high-dimensional, real-world data by introducing a resource-efficient variational quantum kernel framework that uses two encoding schemes and a trainable ansatz with a parameter-scaling technique. It systematically benchmarks eight diverse datasets across tabular, image, time series, and graph domains, showing that correctly designed quantum kernels can surpass standard classical kernels in many cases within constrained quantum resources. The study demonstrates the benefits of amplitude- and truncated-RBF encodings, and validates that scaling the ansatz accelerates convergence and improves final accuracy, with additional gains observed as qubit resources increase. While conducted in classical simulation, the results provide a solid foundation for quantum-kernel methods in real-world ML pipelines and point to future hardware-based evaluations and more quantum-native feature-map designs.

Abstract

Quantum kernel methods are a promising branch of quantum machine learning, yet their practical advantage on diverse, high-dimensional, real-world data remains unverified. Current research has largely been limited to low-dimensional or synthetic datasets, preventing a thorough evaluation of their potential. To address this gap, we developed a variational quantum kernel framework utilizing resource-efficient ansätze for complex classification tasks and introduced a parameter scaling technique to accelerate convergence. We conducted a comprehensive benchmark of this framework on eight challenging, real world and high-dimensional datasets covering tabular, image, time series, and graph data. Our classically simulated results show that the proposed quantum kernel demonstrated a clear performance advantage over standard classical kernels, such as the radial basis function (RBF) kernel. This work demonstrates that properly designed quantum kernels can function as versatile, high-performance tools, laying a foundation for quantum-enhanced applications in real-world machine learning. Further research is needed to fully assess the practical quantum advantage.

Paper Structure

This paper contains 18 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the variational quantum kernel framework. Classical data is encoded into quantum states via a parameterized quantum circuit (Quantum Feature Map), which incorporates a scaling parameter $s$ on the variational gates. The overlaps of these states form a Kernel Matrix. The circuit's parameters $p$ are trained using a classical optimizer (e.g., Adam kingmaAdamMethodStochastic2017) to maximize the Kernel-Target Alignment (KTA). The final, optimized kernel is used by a Classical Support Vector Classifier for classification.
  • Figure 2: (a) Circuit diagram for estimating the kernel entry $K(x_1, x_2) = |\langle \psi(x_2) | \psi(x_1) \rangle|^2$. The feature map $|\psi(x) \rangle$ is generated by the data-encoding unitary $U(x)$ followed by the trainable variational block $V(s, p, x)$. (b) Structure of a single ansatz layer built on five qubits. Each wire undergoes a parameterized $R_y$ rotation and an $R_z$ rotation scaled by the data point and a hyperparameter $s$, followed by a circular chain of CNOT gates for entanglement.
  • Figure 3: Data Reduction Analysis. (Top) A PCA cumulative variance plot of the TCGA-LGG data, showing that 121 dimensions are required to capture a 95% variance threshold. (Bottom) A learning curve for the PhysioNet2017-NA data, generated with a classical proxy model (RBF-SVM), used to identify an efficient sample size where test accuracy begins to plateau (around 500 samples).
  • Figure 4: Comparison of classification accuracies of classical (Linear, RBF) and quantum (QRBF, QAmp) kernels, represented by light gray, dark gray, purple, and orange bars, respectively. Results are presented for eight benchmark datasets, categorized into graph, image, tabular, and time series data types, from left to right.
  • Figure 5: Comparison of initial (dashed) and trained (solid) accuracies for the ansätze without the scaling parameter s (blue) and with it (orange). Results are shown for the QRBF and QAmp kernels on the QSAR Biodegradation and SEED-P12S1 datasets.
  • ...and 1 more figures