Table of Contents
Fetching ...

Software for Creating Scalable Benchmarks from Quantum Algorithms

Noah Siekierski, Stefan Seritan, Neer Patel, Siyuan Niu, Thomas Lubinski, Timothy Proctor

TL;DR

Scarab tackles scalable, reliable quantum benchmarking by providing a user-friendly tool that converts arbitrary circuits into scalable benchmarks via process fidelity estimated with mirror circuit fidelity estimation (MCFE). Built as a module in pyGSTi, scarab supports low-level, full-stack, and subcircuit benchmarks, enabling robust evaluation from circuits spanning thousands to millions of qubits. Through simulations and experiments on Hamiltonian-simulation tasks, compiler testing, and subcircuit extrapolations, scarab demonstrates accurate fidelity estimation and actionable insights into hardware–algorithm trade-offs. The work delivers open-source tooling that standardizes scalable benchmark design and analysis for contemporary and future quantum architectures.

Abstract

Creating scalable, reliable, and well-motivated benchmarks for quantum computers is challenging: straightforward approaches to benchmarking suffer from exponential scaling, are insensitive to important errors, or use poorly-motivated performance metrics. Furthermore, curated benchmarking suites cannot include every interesting quantum circuit or algorithm, which necessitates a tool that enables the easy creation of new benchmarks. In this work, we introduce a software tool for creating scalable and reliable benchmarks that measure a well-motivated performance metric (process fidelity) from user-chosen quantum circuits and algorithms. Our software, called $\texttt{scarab}$, enables the creation of efficient and robust benchmarks even from circuits containing thousands or millions of qubits, by employing efficient fidelity estimation techniques, including mirror circuit fidelity estimation and subcircuit volumetric benchmarking. $\texttt{scarab}$ provides a simple interface that enables the creation of reliable benchmarks by users who are not experts in the theory of quantum computer benchmarking or noise. We demonstrate the flexibility and power of $\texttt{scarab}$ by using it to turn existing inefficient benchmarks into efficient benchmarks, to create benchmarks that interrogate hardware and algorithmic trade-offs in Hamiltonian simulation, to quantify the in-situ efficacy of approximate circuit compilation, and to create benchmarks that use subcircuits to measure progress towards executing a circuit of interest.

Software for Creating Scalable Benchmarks from Quantum Algorithms

TL;DR

Scarab tackles scalable, reliable quantum benchmarking by providing a user-friendly tool that converts arbitrary circuits into scalable benchmarks via process fidelity estimated with mirror circuit fidelity estimation (MCFE). Built as a module in pyGSTi, scarab supports low-level, full-stack, and subcircuit benchmarks, enabling robust evaluation from circuits spanning thousands to millions of qubits. Through simulations and experiments on Hamiltonian-simulation tasks, compiler testing, and subcircuit extrapolations, scarab demonstrates accurate fidelity estimation and actionable insights into hardware–algorithm trade-offs. The work delivers open-source tooling that standardizes scalable benchmark design and analysis for contemporary and future quantum architectures.

Abstract

Creating scalable, reliable, and well-motivated benchmarks for quantum computers is challenging: straightforward approaches to benchmarking suffer from exponential scaling, are insensitive to important errors, or use poorly-motivated performance metrics. Furthermore, curated benchmarking suites cannot include every interesting quantum circuit or algorithm, which necessitates a tool that enables the easy creation of new benchmarks. In this work, we introduce a software tool for creating scalable and reliable benchmarks that measure a well-motivated performance metric (process fidelity) from user-chosen quantum circuits and algorithms. Our software, called , enables the creation of efficient and robust benchmarks even from circuits containing thousands or millions of qubits, by employing efficient fidelity estimation techniques, including mirror circuit fidelity estimation and subcircuit volumetric benchmarking. provides a simple interface that enables the creation of reliable benchmarks by users who are not experts in the theory of quantum computer benchmarking or noise. We demonstrate the flexibility and power of by using it to turn existing inefficient benchmarks into efficient benchmarks, to create benchmarks that interrogate hardware and algorithmic trade-offs in Hamiltonian simulation, to quantify the in-situ efficacy of approximate circuit compilation, and to create benchmarks that use subcircuits to measure progress towards executing a circuit of interest.

Paper Structure

This paper contains 16 sections, 17 equations, 9 figures.

Figures (9)

  • Figure 1: Scalable benchmarking using scarab. A schematic of scarab, which is software for creating efficient benchmarks from interesting quantum circuits on any number of qubits. scarab takes user-specified circuits, along with other options which we describe further in the main text, and creates an efficient and robust benchmark from those circuits. The benchmark consists of a set of benchmarking circuits $B$ whose performance is to be estimated, coupled with a set of proxy circuits $P$ that enable the efficient performance estimation of each circuit in $B$. Each proxy circuit is a mirror circuit, which enables the efficient classical computation of its error-free outcome distribution. The proxy circuits are then executed on the target quantum processor to obtain an empirical outcome distribution for each circuit in $P$. The empirical and error-free outcome distributions for the proxy circuits are then passed into the scarab data analysis function, which first calculates the performance of each proxy circuit and then uses the performance of the proxy circuits to calculate the performance of the benchmarking circuits. scarab also enables the creation of performance summaries including volumetric benchmarking and capability region plots Proctor2021-wt.
  • Figure 2: Classical processing time for scarab. The time ($t_c$) taken by the classical processing in scarab, consisting of the time to turn a circuit input into scarab into the "proxy circuits" to be run, compute all the information about those circuits needed to analyze data from those circuits (e.g., error-free outcome distributions), and to calculate the performance of the proxy and benchmarking circuits using the scarab data analysis function; versus the circuit's width (number of qubits, $n$). We show the mean $t_c$ (markers), and the best and worst $t_c$ (shaded region) over different circuits, for the three kinds of benchmarks created by scarab: low-level benchmarks (blue triangles), full-stack benchmarks (yellow pluses), and subcircuit benchmarks (green diamonds). We compare the scaling of $t_c$ for scarab to the time to classically simulate the same quantum circuits using qiskit_aer (orange crosses) from which we created efficient benchmarks with scarab. This classical simulation is a key step in many other benchmarks, and, unlike the scaling of $t_c$ for scarab, the classical simulation scales exponentially (fit line) and is therefore impractical for many-qubit circuits.
  • Figure 3: Estimating process fidelity with scarab. Simulations demonstrating that benchmarks created with scarab reliably estimate a circuit's process fidelity $F$ in the presence of complex device errors. We show the $F$ estimated using scarab benchmarks (blue markers) versus the true $F$ for two classes of circuit (QFT and QPE circuits) under different noise models. Error bars are one standard deviation and calculated using a non-parametric bootstrap. To demonstrate the difference between process fidelity $F$ and the normalized classical fidelity $\bar{F_c}$ estimated by other benchmarks, we also plot $\bar{F_c}$ versus $F$ (orange markers) for these circuits. We observed that scarab benchmarks accurately estimate the process fidelity in the presence of (a) depolarizing errors on gates, (b) coherent errors on gates, (c) readout errors, and (d) all three kinds of errors. The normalized classical fidelity can be significantly larger or smaller than the process fidelity, depending on the details of the noise model.
  • Figure 4: Estimating the impact of noise and algorithm approximation with scarab's efficient low-level benchmarks. Using scarab, we created low-level benchmarks from first-order Trotter circuits with four different $n$-qubit Hamiltonians---TFIM, Heisenberg, Fermi-Hubbard, and Bose Hubbard Hamiltonians---from HamLib. (a) The algorithmic process fidelity, i.e., the fidelity between the Trotter circuit's (noise-free) unitary and the ideal unitary evolution for that Hamiltonian, versus $n$. (b) The noise process fidelity estimated by the scarab benchmarks (solid markers), which is the process fidelity between the noisy Trotter circuit and the noise-free unitary that circuit implements, and its exact value (open diamonds) up to $n=6$. (c) The estimated full process fidelity (solid markers)---i.e., the process fidelity between the ideal unitary evolution and the noisy implementation of the Trotter evolution, approximated as the product of the measured noise process fidelity and the computed algorithmic process fidelity---and its exact value (open diamonds) up to $n=6$. For both the noise and full process fidelities, we observe close agreement between the scarab estimate and the true values. Shaded regions around the scarab estimates for the process fidelities are 1 standard deviation calculated from a non-parametric bootstrap.
  • Figure 5: Quantifying noise and algorithmic tradeoffs using scarab benchmarks. Using scarab, we created low-level benchmarks from first-order and second-order Trotter circuits for two 5-qubit Hamiltonians (Heisenberg and Max3SAT) with varying number of time steps. These benchmarks were created for IBM Kingston, and both run on IBM Fez and IBM's simulator of IBM Kingston. (a) The algorithmic process fidelity versus the number of time steps for each Hamiltonian and both first- and second-order Trotter circuits. We used scarab to estimate the noise process fidelity with (b) the simulator of IBM Kingston and (d) IBM Kingston. In all cases, we observe that the noise fidelity decreases as the number of time steps increases, due to increasing depth of the circuit. To quantify the optimal trade-off between noise and algorithmic fidelity, we estimated the full process fidelity for (c) the simulation of IBM Kingston and (d) IBM Kingston. For the Heisenberg Hamiltonian, we find that the optimal algorithm parameters are a second-order Trotter circuit with 3 time steps, in both the simulation and the experiment. Shaded regions around the scarab estimates for the process fidelities are 1 standard deviation calculated from a non-parametric bootstrap.
  • ...and 4 more figures