Benchmarking Quantum Processor Performance at Scale
David C. McKay, Ian Hincks, Emily J. Pritchett, Malcolm Carroll, Luke C. G. Govia, Seth T. Merkel
TL;DR
Quantum processors require benchmarks that scale beyond discrete pass/fail tests like quantum volume. The paper introduces Layer Fidelity (LF), a scalable benchmark that uses disjoint layers of two-qubit gates and simultaneous direct randomized benchmarking to measure layer fidelities across N qubits. LF, via EPLG, captures crosstalk and yields a size-independent error metric, with connections to γ used for probabilistic error mitigation. Experimental data on IBM Eagle and Heron devices show LF values that reflect crosstalk and agree with mirror RB and Pauli-learning estimates, illustrating LF's practical utility for large-scale quantum hardware. The work positions LF as a fast, informative complement to existing benchmarks for hardware-aware algorithm design and error-mitigation budgeting.
Abstract
As quantum processors grow, new performance benchmarks are required to capture the full quality of the devices at scale. While quantum volume is an excellent benchmark, it focuses on the highest quality subset of the device and so is unable to indicate the average performance over a large number of connected qubits. Furthermore, it is a discrete pass/fail and so is not reflective of continuous improvements in hardware nor does it provide quantitative direction to large-scale algorithms. For example, there may be value in error mitigated Hamiltonian simulation at scale with devices unable to pass strict quantum volume tests. Here we discuss a scalable benchmark which measures the fidelity of a connecting set of two-qubit gates over $N$ qubits by measuring gate errors using simultaneous direct randomized benchmarking in disjoint layers. Our layer fidelity can be easily related to algorithmic run time, via $γ$ defined in Ref.\cite{berg2022probabilistic} that can be used to estimate the number of circuits required for error mitigation. The protocol is efficient and obtains all the pair rates in the layered structure. Compared to regular (isolated) RB this approach is sensitive to crosstalk. As an example we measure a $N=80~(100)$ qubit layer fidelity on a 127 qubit fixed-coupling "Eagle" processor (ibm\_sherbrooke) of 0.26(0.19) and on the 133 qubit tunable-coupling "Heron" processor (ibm\_montecarlo) of 0.61(0.26). This can easily be expressed as a layer size independent quantity, error per layered gate (EPLG), which is here $1.7\times10^{-2}(1.7\times10^{-2})$ for ibm\_sherbrooke and $6.2\times10^{-3}(1.2\times10^{-2})$ for ibm\_montecarlo.
