Benchmarking quantum computers

Timothy Proctor; Kevin Young; Andrew D. Baczewski; Robin Blume-Kohout

Benchmarking quantum computers

Timothy Proctor, Kevin Young, Andrew D. Baczewski, Robin Blume-Kohout

TL;DR

The role of benchmarks and benchmarking and how good benchmarks can drive and measure progress towards the long-term goal of useful quantum computations, known as quantum utility are discussed.

Abstract

The rapid pace of development in quantum computing technology has sparked a proliferation of benchmarks for assessing the performance of quantum computing hardware and software. Good benchmarks empower scientists, engineers, programmers, and users to understand a computing system's power, but bad benchmarks can misdirect research and inhibit progress. In this Perspective, we survey the science of quantum computer benchmarking. We discuss the role of benchmarks and benchmarking, and how good benchmarks can drive and measure progress towards the long-term goal of useful quantum computations, i.e., "quantum utility". We explain how different kinds of benchmark quantify the performance of different parts of a quantum computer, we survey existing benchmarks, critically discuss recent trends in benchmarking, and highlight important open research questions in this field.

Benchmarking quantum computers

TL;DR

The role of benchmarks and benchmarking and how good benchmarks can drive and measure progress towards the long-term goal of useful quantum computations, known as quantum utility are discussed.

Abstract

Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Quantum computer benchmarking
Kinds of quantum computer benchmark
Measuring progress to quantum utility
Acknowledgements
Author contributions
Competing interests

Figures (4)

Figure 1: Quantum computer benchmarks.a. Quantum computer benchmarks are methods that are run on quantum computing systems, or on some of their subsystems (qubits, compilers, etc), to measure performance. Each benchmark measures one or more metrics of performance, such as the error rates of a system's quantum gates. b. Benchmarks enable comparing a quantum computer's performance to other contemporary, historical, or hypothetical systems, e.g., hypothetical systems on a particular roadmap to quantum utility.
Figure 2: Kinds of benchmark. Benchmarks vary widely in the abstraction level of their tasks, ranging from executing specific low-level quantum circuits to solving a computational problem, and by the complexity of the object whose performance they measure, ranging from individual logic gates to entire computing systems. The abstraction and complexity of a selection of important benchmarks and benchmark families are shown here.
Figure 3: How benchmarks interact with integrated quantum computers. Benchmarks test the joint performance of one or more parts of an integrated quantum computer's "stack" (its qubits, compilers, routers, etc). They do so by inserting tasks into one level of the stack, and then analyzing output from the same (or a lower) level of the stack. Benchmarks can limit or adjust what each layer of the stack does (e.g., limiting the types of compilation), which can enable robust and efficient benchmarking. Benchmarks that enter and exit the stack at different levels measure fundamentally different aspects of performance, and form different categories of benchmark. Four important categories are shown here.
Figure 4: Assessing quantum computer performance via capability. This figure illustrates one way to compare experimentally benchmarked performance against resource estimates for challenge problems, using a multidimensional capability metric. Challenge problems and benchmark tasks are represented by the width (some measure of the number of qubits) and depth (some measure of the number of clock cycles) of a quantum circuit that performs the task. Regions indicate the circuits performable by two real-world quantum computers---Google’s Sycamore (green) as extrapolated from results in Arute et al.Arute2019-mk, and an ensemble of IBM Q devices (pink) benchmarked by our group Proctor2021-wt---and one hypothetical quantum computer (blue) (we use a success threshold of $1/e$). Points indicate constant-factor resource estimates for three candidate challenge problems analyzed in the literature gidney2021factorlee2021evenRubin2024-gc. For these problems, width is the number of logical qubits, not accounting for logical qubits used in distillation or routing, and depth is the total number of non-Clifford operations (i.e., Toffoli and/or T gates). These metrics are somewhat crude, but indicate the rough scale of resources required for these challenge problems. We emphasize the wide gulf between that "utility" scale and current state of the art capabilities---logarithmic axes were required to compress both scales into one figure. Plots like this one could enable stakeholders to track and extrapolate the growth of quantum computer capabilities over time, toward eventual achievement of quantum utility.

Benchmarking quantum computers

TL;DR

Abstract

Benchmarking quantum computers

Authors

TL;DR

Abstract

Table of Contents

Figures (4)