Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK
Florian J. Kiwit, Marwa Marso, Philipp Ross, Carlos A. Riofrío, Johannes Klepsch, Andre Luckow
TL;DR
This work extends the QUARK benchmarking framework to application-oriented quantum generative learning, introducing QUARK 2.0 to support end-to-end QML benchmarks involving QGANs and QCBMs. It presents a generalized Core-based architecture and a six-module QML workflow (Application, Dataset, Transformation, Circuit, Library, Training) driven by a configuration file, enabling reproducible comparisons across datasets, circuit types, training methods, and hardware. The paper demonstrates GPU-accelerated simulations and real-device deployment (IonQ Harmony), analyzes generalization with a suite of metrics, and provides detailed performance characterizations including scaling, noise effects, and optimization strategies using KL divergence and CMA-ES. Overall, QUARK 2.0 offers a scalable, extensible platform for rigorous end-to-end benchmarking of quantum generative learning, with practical implications for hardware-aware model evaluation and protocol design; mathematically, it relies on metrics such as $C_{KL}(p_{target}, p_{model})$ and discretization $N_d=2^n$ to quantify distribution alignment across varying $n$ and $d$.
Abstract
Benchmarking of quantum machine learning (QML) algorithms is challenging due to the complexity and variability of QML systems, e.g., regarding model ansatzes, data sets, training techniques, and hyper-parameters selection. The QUantum computing Application benchmaRK (QUARK) framework simplifies and standardizes benchmarking studies for quantum computing applications. Here, we propose several extensions of QUARK to include the ability to evaluate the training and deployment of quantum generative models. We describe the updated software architecture and illustrate its flexibility through several example applications: (1) We trained different quantum generative models using several circuit ansatzes, data sets, and data transformations. (2) We evaluated our models on GPU and real quantum hardware. (3) We assessed the generalization capabilities of our generative models using a broad set of metrics that capture, e.g., the novelty and validity of the generated data.
