Table of Contents
Fetching ...

Featuremetric benchmarking: Quantum computer benchmarks based on circuit features

Timothy Proctor, Anh Tran, Xingxin Liu, Aditya Dhumuntarao, Stefan Seritan, Alaina Green, Norbert M Linke

TL;DR

This work tackles the limitation of volumetric benchmarking, which summarizes quantum computer performance only as a function of circuit width and depth, by introducing featuremetric benchmarking that maps performance as a function of multiple circuit features. The authors formalize a capability-learning framework and use Gaussian process regression, including monotonic variants, to interpolate and predict performance across feature space. They demonstrate the approach with IBM Q and IonQ data up to 27 qubits, using mirror circuits and Clifford-based process fidelity as capability metrics, and show that incorporating features like two-qubit gate density improves predictive power while enabling data-efficient volumetric summaries. The results highlight the potential for data-efficient benchmarking and guide future work on richer feature sets and online, adaptive sampling strategies.

Abstract

Benchmarks that concisely summarize the performance of many-qubit quantum computers are essential for measuring progress towards the goal of useful quantum computation. In this work, we present a benchmarking framework that is based on quantifying how a quantum computer's performance on quantum circuits varies as a function of features of those circuits, such as circuit depth, width, two-qubit gate density, problem input size, or algorithmic depth. Our featuremetric benchmarking framework generalizes volumetric benchmarking -- a widely-used methodology that quantifies performance versus circuit width and depth -- and we show that it enables richer and more faithful models of quantum computer performance. We demonstrate featuremetric benchmarking with example benchmarks run on IBM Q and IonQ systems of up to 27 qubits, and we show how to produce performance summaries from the data using Gaussian process regression. Our data analysis methods are also of interest in the special case of volumetric benchmarking, as they enable the creation of intuitive two-dimensional capability regions using data from few circuits.

Featuremetric benchmarking: Quantum computer benchmarks based on circuit features

TL;DR

This work tackles the limitation of volumetric benchmarking, which summarizes quantum computer performance only as a function of circuit width and depth, by introducing featuremetric benchmarking that maps performance as a function of multiple circuit features. The authors formalize a capability-learning framework and use Gaussian process regression, including monotonic variants, to interpolate and predict performance across feature space. They demonstrate the approach with IBM Q and IonQ data up to 27 qubits, using mirror circuits and Clifford-based process fidelity as capability metrics, and show that incorporating features like two-qubit gate density improves predictive power while enabling data-efficient volumetric summaries. The results highlight the potential for data-efficient benchmarking and guide future work on richer feature sets and online, adaptive sampling strategies.

Abstract

Benchmarks that concisely summarize the performance of many-qubit quantum computers are essential for measuring progress towards the goal of useful quantum computation. In this work, we present a benchmarking framework that is based on quantifying how a quantum computer's performance on quantum circuits varies as a function of features of those circuits, such as circuit depth, width, two-qubit gate density, problem input size, or algorithmic depth. Our featuremetric benchmarking framework generalizes volumetric benchmarking -- a widely-used methodology that quantifies performance versus circuit width and depth -- and we show that it enables richer and more faithful models of quantum computer performance. We demonstrate featuremetric benchmarking with example benchmarks run on IBM Q and IonQ systems of up to 27 qubits, and we show how to produce performance summaries from the data using Gaussian process regression. Our data analysis methods are also of interest in the special case of volumetric benchmarking, as they enable the creation of intuitive two-dimensional capability regions using data from few circuits.

Paper Structure

This paper contains 24 sections, 43 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Volumetric benchmarking of ibmq_montreal. (a) The results of a volumetric benchmark run on a 27-qubit IBM quantum computer (ibmq_montreal). This plot shows the mean success probability of randomized mirror circuits versus circuit shape (circuit width $w$ and benchmark depth Proctor2022-yl$d$). (b) Histograms of the success probabilities of the 20 circuits of each shape that were run, for a selection of circuit shapes. Each circuit's success probability was estimated from 1024 executions of that circuit, so the differences seen here are statistically significant. This demonstrates that, although the volumetric benchmarking plot of (a) shows mean performance as a function of circuit width and depth, two circuits with the same width and depth can have significantly different success probabilities.
  • Figure 2: Randomized mirror circuits. A diagram of the randomized mirror circuits used in two of our example featuremetric benchmarks, both of which we ran on IBM Q systems. These particular randomized mirror circuits contain only Clifford gates ($C_i$, with $i=0,1,\dots,23$, denote the 24 single-qubit Clifford gates). They have a variable width, depth, and mean density of two-qubit gates.
  • Figure 3: A simple featuremetric benchmark summarized in volumetric benchmarking plots (ibmq_algiers). An example of a simple featuremetric benchmark, with three features, in which the data can be summarized with three volumetric benchmarking plots, which we ran on ibmq_algiers. This featuremetric benchmark consists of varying three circuit features: circuit width ($w$), circuit depth ($d$), and two-qubit gate density ($\xi_{\textrm{2Q}}$), and we systematically varied the features $(w, d, \xi_{\textrm{2Q}})$ over a three-dimensional grid of values $\{2,3,\dots,27\} \times \{2^k\}_{i =2,3,\dots 10} \times \{0, 1/8, 1/4\}$ (except that feature vectors with large depths and widths were discarded, with the particular values discarded implied by the missing data in the plot). Because $\xi_{\textrm{2Q}}$ took only three discrete values (0, $1/8$ and $1/4$), and we systematically varied both circuit width and depth, we can represent the results in three volumetric benchmarking plots---one for each value of $\xi_{\textrm{2Q}}$---as shown here. The circuits used were randomized mirror circuits, and, as in the volumetric benchmarking results for ibmq_montreal shown in Figure \ref{['fig:ibm_montreal_vb']}, we plot the mean success probability of all $K=10$ randomized mirror circuits run at each feature value. We observe substantial changes in the circuits' success probabilities as we vary $\chi_{\textrm{2Q}}$. For example, at $(w,d)=(14,64)$ the mean success probabilities for $\chi_{\textrm{2Q}}=0$, $1/8$, and $1/4$ are $(51\pm 2)\%$, $(19 \pm 4)\%$, and $(7\pm 2)\%$, respectively, where, here and throughout, error bars are the standard error calculated using a bootstrap. This implies that our choice to vary two-qubit gate density, in addition to circuit width and depth, will increase the predictive accuracy of a model for circuit performance based around interpolating these results.
  • Figure 4: A three-dimensional featuremetric benchmark (Forte1). The results of a three-dimensional featuremetric benchmark run on 20 qubits of IonQ's Forte1 cloud-access system. In this benchmark, we measured the process fidelities $F$ of random circuits versus three circuit features: width, depth, and two-qubit gate density. In the central panel, we show the mean estimated $F$ at each feature vector value $\vec{v}$ (we selected and measured the process fidelities of 30 circuits at each $\vec{v}$) versus the three feature values. We also show three different 2-dimensional projections of the data, each consisting of discarding one of the three features. In this benchmark, feature vectors were selected quasirandomly, using a Sobol sequence, so as to more uniformly "fill up" the 3-dimensional feature space than is typical with (pseudo)randomly sampled feature vectors.
  • Figure 5: Quantifying the monotonicity of the circuit fidelity decay (Forte1). We quantify the monotonicity of the decay in circuit fidelities with increasing feature values using a simple metric $\delta_{\vec{v}}$ given by the minimum of $F_{\vec{v}'}- F_{\vec{v}}$ over all $\vec{v}'$ that are strictly smaller than $\vec{v}$ (i.e., all $\vec{v}'$ that are equal or smaller for every feature, and smaller for at least one feature). Negative values of $\delta_{\vec{v}}$ indicate feature values at which we observed a process fidelity that was larger for a smaller feature value, which is inconsistent with monotonicity. The data is almost monotonic when using all three features (red histogram), but substantially non-monotonic when discarding any of the three features (blue, orange, and green histograms).
  • ...and 4 more figures