Table of Contents
Fetching ...

Scalable Full-Stack Benchmarks for Quantum Computers

Jordan Hines, Timothy Proctor

TL;DR

The paper tackles the challenge of scalable, full-stack benchmarking for quantum computers by introducing mirror full-stack benchmarking that leverages the classical preprocessor’s output to create efficiently verifiable benchmarking circuits. It defines the process fidelity $F(\Lambda, \mathcal{U}) = \frac{1}{4^n}\mathrm{Tr}(\mathcal{U}^{\dagger}\Lambda)$ and polarization $\gamma(\mathcal{U}^{\dagger}\Lambda) = \frac{4^n}{4^n-1}F(\Lambda, \mathcal{U}) - \frac{1}{4^n-1}$, and uses MCFE-based mirror circuits to estimate $F(\Lambda, \mathcal{U}')$ without simulating high-level circuits. The framework yields scalable benchmarks for three circuit families: mirror quantum volume (MQV), randomized grid/linear geometry circuits, and Hamiltonian-simulation benchmarks, demonstrated via simulations and IBM Q hardware. Key results show that the MCFE-based estimates are robust, scalable, and informative about process fidelity and subroutine performance, including sensitivity to coherent errors, while avoiding exponential classical simulations. Overall, the approach provides precise, hardware-aware benchmarks that can track progress toward useful quantum computation across large processors.

Abstract

Quantum processors are now able to run quantum circuits that are infeasible to simulate classically, creating a need for benchmarks that assess a quantum processor's rate of errors when running these circuits. Here, we introduce a general technique for creating efficient benchmarks from any set of quantum computations, specified by unitary circuits. Our benchmarks assess the integrated performance of a quantum processor's classical compilation algorithms and its low-level quantum operations. Unlike existing "full-stack benchmarks", our benchmarks do not require classical simulations of quantum circuits, and they use only efficient classical computations. We use our method to create randomized circuit benchmarks, including a computationally efficient version of the quantum volume benchmark, and an algorithm-based benchmark that uses Hamiltonian simulation circuits. We perform these benchmarks on IBM Q devices and in simulations, and we compare their results to the results of existing benchmarking methods.

Scalable Full-Stack Benchmarks for Quantum Computers

TL;DR

The paper tackles the challenge of scalable, full-stack benchmarking for quantum computers by introducing mirror full-stack benchmarking that leverages the classical preprocessor’s output to create efficiently verifiable benchmarking circuits. It defines the process fidelity and polarization , and uses MCFE-based mirror circuits to estimate without simulating high-level circuits. The framework yields scalable benchmarks for three circuit families: mirror quantum volume (MQV), randomized grid/linear geometry circuits, and Hamiltonian-simulation benchmarks, demonstrated via simulations and IBM Q hardware. Key results show that the MCFE-based estimates are robust, scalable, and informative about process fidelity and subroutine performance, including sensitivity to coherent errors, while avoiding exponential classical simulations. Overall, the approach provides precise, hardware-aware benchmarks that can track progress toward useful quantum computation across large processors.

Abstract

Quantum processors are now able to run quantum circuits that are infeasible to simulate classically, creating a need for benchmarks that assess a quantum processor's rate of errors when running these circuits. Here, we introduce a general technique for creating efficient benchmarks from any set of quantum computations, specified by unitary circuits. Our benchmarks assess the integrated performance of a quantum processor's classical compilation algorithms and its low-level quantum operations. Unlike existing "full-stack benchmarks", our benchmarks do not require classical simulations of quantum circuits, and they use only efficient classical computations. We use our method to create randomized circuit benchmarks, including a computationally efficient version of the quantum volume benchmark, and an algorithm-based benchmark that uses Hamiltonian simulation circuits. We perform these benchmarks on IBM Q devices and in simulations, and we compare their results to the results of existing benchmarking methods.
Paper Structure (16 sections, 21 equations, 7 figures)

This paper contains 16 sections, 21 equations, 7 figures.

Figures (7)

  • Figure 1: A scalable full-stack benchmark generator. (a) A full-stack quantum computer's performance on a target application is determined by the integrated performance of classical preprocessing and the quantum processor. Existing full-stack benchmarks (purple) consist of tasking a full-stack quantum computer with running high-level quantum circuits. The quantum computer first performs classical preprocessing, then runs the compiled ("low-level") circuits on a quantum processor. To compute performance on the benchmark, the results are compared to a classical simulation of the high-level circuits, which is typically exponentially expensive to perform. Our full-stack benchmarking method uses a different approach: it uses the compiled circuits to generate efficiently verifiable benchmarking circuits, which are then run on the quantum processor. (b) These benchmarking circuits are mirror circuits of three types that use a compiled circuit output by the full-stack quantum computer's classical preprocessing and a pre-compiled exact compilation of the target circuit. (c) The results of running these circuits are used to efficiently compute the process fidelity of the processor's imperfect implementation of the target unitary.
  • Figure 2: Simulations of MQV. We simulated MQV and the standard quantum volume protocol on $n=3,4,5,6$ qubits with single- and two-qubit depolarizing error. (a) The mean polarization of random SU(4) circuits, measured by MQV. (b) The observed heavy output probability, rescaled to an estimated polarization, of the QV circuits. (c) We compare the polarization estimates from MQV to the exact polarizations of the compiled circuits. (d) The heavy output probability from the QV benchmark versus the estimated polarization from our method. The average heavy output probability typically overestimates the polarization of the compiled circuits.
  • Figure 3: Demonstrating MQV on IBM Q. We ran the MQV and QV benchmarks on ibm_hanoi. We plot the estimated average polarization of the QV circuits from our benchmark (left plot), and we compare these results to those of the QV benchmark with the same high-level circuits (center plot) and to predictions of our benchmark's results using a noise model based on the processor's calibration data (right plot). Because we did not run our benchmark using an exhaustive set of circuit shapes, we use an exponential interpolation of the data (dark-outlined boxes) to estimate the average polarization of QV circuits of other shapes (light-outlined boxes). We fit the average polarization data for a fixed circuit depth to $\bar{\gamma}_{n,d} = Ap^d$ and use these exponential fits to estimate $\bar{\gamma}_{n,d}$ for depths not included in our experiment. For circuit shapes with $n > 6$ or $d=1$, we instead interpolate by fitting an exponential $\bar{\gamma}_{n,d} = Ap^n$ to all data of a fixed benchmark depth $d$. The black line indicates where the average polarization falls below the QV pass threshold, $\bar{\gamma}_{n,d} \approx 0.48$.
  • Figure 4: Randomized full-stack benchmarks on IBM Q. We ran random circuit full-stack benchmarks using high-level circuits defined on grid and linear geometries on ibm_hanoi. (a) To generate a benchmark depth-$d$ high-level circuit for our grid geometry benchmark, we generate random circuits where each layer consists of $\lfloor n/2 \rfloor$ SU(4) gates between connected pairs of qubits arranged in a grid. An analogous process is used to generate the linear connectivity circuits. (b) The average polarizations of the each type of random circuit, compiled for and run on ibm_hanoi, estimated using our benchmark. The most significant differences in polarizations are seen in shape $(n,d)=(14,2)$ circuits and $(4,8)$ circuits. We use an exponential decay heuristic ($\bar{\gamma}_{n,d} = Ap^d$ for shapes with $d>1$ and $n \leq 10$, and $\bar{\gamma}_{n,d} = Ap^n$ otherwise) to estimate the average polarization of circuits of shapes not included in our benchmark (light-outlined boxes).
  • Figure 5: A scalable Hamiltonian simulation full-stack benchmark. We use our method to construct a benchmark from Hamiltonian simulation circuits. (a) The structure of the $n$-qubit Hamiltonian simulation circuits we used, for the case of $n=4$. Each circuit consists of a subroutine repeated $d$ times, and we call $d$ the benchmark depth of the circuit. We use our benchmark generator to construct a scalable full-stack benchmark with Hamiltonian simulation circuits. (b) We simulated our benchmark with a local depolarizing error model (left plot) and a local depolarizing model with added coherent $Z$ error on idling qubits (right plot) for varied-shape circuits. We compare the polarizations estimated by our method to the exact circuit polarizations and the normalized classical fidelity between the ideal and observed base circuit output distributions. For both error models, the normalized classical fidelity overestimates the polarization, and the effect is larger in the error model with coherent $Z$ errors.
  • ...and 2 more figures