Table of Contents
Fetching ...

Benchmarking Quantum Processor Performance at Scale

David C. McKay, Ian Hincks, Emily J. Pritchett, Malcolm Carroll, Luke C. G. Govia, Seth T. Merkel

TL;DR

Quantum processors require benchmarks that scale beyond discrete pass/fail tests like quantum volume. The paper introduces Layer Fidelity (LF), a scalable benchmark that uses disjoint layers of two-qubit gates and simultaneous direct randomized benchmarking to measure layer fidelities across N qubits. LF, via EPLG, captures crosstalk and yields a size-independent error metric, with connections to γ used for probabilistic error mitigation. Experimental data on IBM Eagle and Heron devices show LF values that reflect crosstalk and agree with mirror RB and Pauli-learning estimates, illustrating LF's practical utility for large-scale quantum hardware. The work positions LF as a fast, informative complement to existing benchmarks for hardware-aware algorithm design and error-mitigation budgeting.

Abstract

As quantum processors grow, new performance benchmarks are required to capture the full quality of the devices at scale. While quantum volume is an excellent benchmark, it focuses on the highest quality subset of the device and so is unable to indicate the average performance over a large number of connected qubits. Furthermore, it is a discrete pass/fail and so is not reflective of continuous improvements in hardware nor does it provide quantitative direction to large-scale algorithms. For example, there may be value in error mitigated Hamiltonian simulation at scale with devices unable to pass strict quantum volume tests. Here we discuss a scalable benchmark which measures the fidelity of a connecting set of two-qubit gates over $N$ qubits by measuring gate errors using simultaneous direct randomized benchmarking in disjoint layers. Our layer fidelity can be easily related to algorithmic run time, via $γ$ defined in Ref.\cite{berg2022probabilistic} that can be used to estimate the number of circuits required for error mitigation. The protocol is efficient and obtains all the pair rates in the layered structure. Compared to regular (isolated) RB this approach is sensitive to crosstalk. As an example we measure a $N=80~(100)$ qubit layer fidelity on a 127 qubit fixed-coupling "Eagle" processor (ibm\_sherbrooke) of 0.26(0.19) and on the 133 qubit tunable-coupling "Heron" processor (ibm\_montecarlo) of 0.61(0.26). This can easily be expressed as a layer size independent quantity, error per layered gate (EPLG), which is here $1.7\times10^{-2}(1.7\times10^{-2})$ for ibm\_sherbrooke and $6.2\times10^{-3}(1.2\times10^{-2})$ for ibm\_montecarlo.

Benchmarking Quantum Processor Performance at Scale

TL;DR

Quantum processors require benchmarks that scale beyond discrete pass/fail tests like quantum volume. The paper introduces Layer Fidelity (LF), a scalable benchmark that uses disjoint layers of two-qubit gates and simultaneous direct randomized benchmarking to measure layer fidelities across N qubits. LF, via EPLG, captures crosstalk and yields a size-independent error metric, with connections to γ used for probabilistic error mitigation. Experimental data on IBM Eagle and Heron devices show LF values that reflect crosstalk and agree with mirror RB and Pauli-learning estimates, illustrating LF's practical utility for large-scale quantum hardware. The work positions LF as a fast, informative complement to existing benchmarks for hardware-aware algorithm design and error-mitigation budgeting.

Abstract

As quantum processors grow, new performance benchmarks are required to capture the full quality of the devices at scale. While quantum volume is an excellent benchmark, it focuses on the highest quality subset of the device and so is unable to indicate the average performance over a large number of connected qubits. Furthermore, it is a discrete pass/fail and so is not reflective of continuous improvements in hardware nor does it provide quantitative direction to large-scale algorithms. For example, there may be value in error mitigated Hamiltonian simulation at scale with devices unable to pass strict quantum volume tests. Here we discuss a scalable benchmark which measures the fidelity of a connecting set of two-qubit gates over qubits by measuring gate errors using simultaneous direct randomized benchmarking in disjoint layers. Our layer fidelity can be easily related to algorithmic run time, via defined in Ref.\cite{berg2022probabilistic} that can be used to estimate the number of circuits required for error mitigation. The protocol is efficient and obtains all the pair rates in the layered structure. Compared to regular (isolated) RB this approach is sensitive to crosstalk. As an example we measure a qubit layer fidelity on a 127 qubit fixed-coupling "Eagle" processor (ibm\_sherbrooke) of 0.26(0.19) and on the 133 qubit tunable-coupling "Heron" processor (ibm\_montecarlo) of 0.61(0.26). This can easily be expressed as a layer size independent quantity, error per layered gate (EPLG), which is here for ibm\_sherbrooke and for ibm\_montecarlo.
Paper Structure (11 sections, 2 theorems, 36 equations, 8 figures)

This paper contains 11 sections, 2 theorems, 36 equations, 8 figures.

Key Result

Theorem 1

Suppose $\Lambda$ is a CPTP Pauli channel with a process fidelity $F_p=\operatorname{Tr}\Lambda/d^2$, and $\gamma = \det(\Lambda)^{-2/d^2}$. Then it holds that where

Figures (8)

  • Figure 1: (a) Here we consider a linear chain of qubits with nearest neighbor coupling for which a connecting set of gates is comprised of a disjoint layer of gates starting on qubit 0 followed by a disjoint layer of gates starting on qubit 1. The disjoint layers (b) can either be the maximally simultaneous sets, but could alternatively be a more sparse set (c) split into more disjoint layers. (d) For the disjoint layer set of (b) this requires two simultaneous direct RB experiments here shown for depth $l=4$ with the last layer the inverses in each disjoint space. We measure decay curves as a function of $l$ and fit to extract the fidelities, which are then multiplied together as given by Eqn. \ref{['eqn:lf']} to obtain the layer fidelity.
  • Figure 2: (Top Left) Layer fidelity for the 127 qubit ibm_sherbrooke "Eagle" processor (blue triangles) and the 133 qubit ibm_montecarlo "Heron" processor (red circles) taken using the procedure outlined in the main text for various chain lengths up to 100 qubits. (Top Right) The same data converted to error per layered gate (EPLG). (Bottom Left) Quantile plot of the individual gate errors measured from the best 100 qubit chain from simultaneous direct RB ("layered") versus the backend reported gate errors ("isolated") on the same chain. Errors are reported as process error ($\epsilon_p$) as opposed to average gate error ($\epsilon_g$) where $\epsilon_p=\frac{d+1}{d} \epsilon_g$. Both devices have among the lowest gate error measured on a superconducting device, noting the minimum isolated gate error (process error) on ibm_sherbrooke (Eagle) of $3.2(4.0) \times 10^{-3}$ and on ibm_montecarlo (Heron) of $1.2(1.6) \times 10^{-3}$ (Bottom Right) The 100 qubit chain (red) overlaid on the ibm_sherbrooke (left) and ibm_montecarlo (right) device layout schematics.
  • Figure 3: (Top) Comparing mirror RB data versus layer depth $l$ to the predicted decay from layer fidelity measured on the same set of qubits; for this data $LF=0.702$ and so the dashed line is $0.702^l$. The different mirror RB curves either include (blue) or do not include (red) a random Pauli layer. (Middle) The layer fidelity versus the number of disjoint layers used in the protocol (black). The fidelity decreases as the number of layers increases because the total duration is longer. We estimate this effect by just considering the fidelity decrease due to decoherence (blue). More layers does decrease the mean gate error (dashed, red) due to lower crosstalk, but overall this is not enough to improve because of the increased length. (Bottom) Qubits used on ibm_peekskill are [23, 24, 25, 22, 19, 16, 14, 11, 8, 5, 3, 2, 1, 4, 7, 10, 12, 15, 18, 17].
  • Figure 4: Comparing $\gamma$ measured from layer fidelity (blue, circles) and Eqn. \ref{['eqn:lf_to_gamma']} to $\gamma$ measured using Pauli-learning berg2022probabilistic (red, squares). Measured on ibm_peekskill for the connected set of qubits [19, 22, 25, 24, 23, 21, 18, 15, 12, 13, 14, 11, 8, 5, 3, 2] (even and odd disjoint layers).
  • Figure 5: (Top) Simulation of the even layer with incoherent errors ($T_1=T_2$) and the gate unit length of 50 ns. As described in the main text, when the gate lengths are different simultaneous RB trivially gives the wrong answer. (Bottom) Similar simulation for the two layers with incoherent errors ($T_1=T_2$) comparing mirror to layer and the agreement is exact. There are two theory curves in the bottom plot; in the red curve the single qubit gates and the two qubit gate layers are included in calculating the total incoherent error (there are on average 1.5 single qubit gate layers per two-qubit gate layer). The green theory curve is the error if we just consider the two-qubit layer. For agreement with theory, the single qubit gates in the layer must be considered. There are 10 random sequences in each simulation.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Lemma 1
  • proof