Table of Contents
Fetching ...

Metriq: A Collaborative Platform for Benchmarking Quantum Computers

Alessandro Cosentino, Changhao Li, Vincent Russo, Bradley A. Chase, Tom Lubinski, Siyuan Niu, Neer Patel, Nathan Shammah, William J. Zeng

TL;DR

The Metriq benchmark suite spans both system-level metrics that characterize fundamental device properties such as entanglement quality, gate performance, and circuit speed, as well as application-inspired protocols that assess performance on quantum machine learning, optimization, and quantum simulation tasks.

Abstract

The fragmented landscape of quantum computer benchmarks, characterized by system-specific tools and inconsistent evaluation methodologies, hinders reliable cross-platform performance assessment. We introduce Metriq, an open-source collaborative platform for reproducible cross-platform quantum benchmarking that integrates benchmark definition and execution, data collection, and public presentation into a unified workflow. The Metriq benchmark suite spans both system-level metrics that characterize fundamental device properties such as entanglement quality, gate performance, and circuit speed, as well as application-inspired protocols that assess performance on quantum machine learning, optimization, and quantum simulation tasks. Benchmarks are chosen to scale with processor size, and the framework incorporates cost and resource estimation to support practical evaluation. Using Metriq, we collect and publicly release results from more than ten quantum computers across multiple hardware vendors, enabling systematic cross-platform comparison. The resulting curated dataset also reveals the practical strengths and limitations of individual benchmarks, creating a feedback loop that informs the ongoing refinement of the suite. To summarize performance across the benchmark suite, we introduce the Metriq Score, a composite index aggregating benchmark outcomes. We further present cross-benchmark analyses enabled by the shared dataset and their correlations with hardware calibration metrics. Through open development and data sharing, Metriq provides a practical foundation for reproducible benchmarking of quantum computers as hardware and benchmarking methods continue to evolve.

Metriq: A Collaborative Platform for Benchmarking Quantum Computers

TL;DR

The Metriq benchmark suite spans both system-level metrics that characterize fundamental device properties such as entanglement quality, gate performance, and circuit speed, as well as application-inspired protocols that assess performance on quantum machine learning, optimization, and quantum simulation tasks.

Abstract

The fragmented landscape of quantum computer benchmarks, characterized by system-specific tools and inconsistent evaluation methodologies, hinders reliable cross-platform performance assessment. We introduce Metriq, an open-source collaborative platform for reproducible cross-platform quantum benchmarking that integrates benchmark definition and execution, data collection, and public presentation into a unified workflow. The Metriq benchmark suite spans both system-level metrics that characterize fundamental device properties such as entanglement quality, gate performance, and circuit speed, as well as application-inspired protocols that assess performance on quantum machine learning, optimization, and quantum simulation tasks. Benchmarks are chosen to scale with processor size, and the framework incorporates cost and resource estimation to support practical evaluation. Using Metriq, we collect and publicly release results from more than ten quantum computers across multiple hardware vendors, enabling systematic cross-platform comparison. The resulting curated dataset also reveals the practical strengths and limitations of individual benchmarks, creating a feedback loop that informs the ongoing refinement of the suite. To summarize performance across the benchmark suite, we introduce the Metriq Score, a composite index aggregating benchmark outcomes. We further present cross-benchmark analyses enabled by the shared dataset and their correlations with hardware calibration metrics. Through open development and data sharing, Metriq provides a practical foundation for reproducible benchmarking of quantum computers as hardware and benchmarking methods continue to evolve.
Paper Structure (33 sections, 37 equations, 7 figures, 14 tables)

This paper contains 33 sections, 37 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: Developer and user workflow in Metriq. The workflow includes steps from feature implementation in metriq-gym, benchmark definition, to benchmark execution on devices or simulators, followed by data processing and publication.
  • Figure 2: Edge coloring for the BSEQ benchmark on a representative section of the heavy-hex lattice topology found on IBM devices. Edges with the same color can be measured simultaneously, as they form an independent set where no two edges share a common qubit. The coloring partitions the couplings into 3 color classes, reducing the required circuits from 144 (36 edges $\times$ 4 bases) to 12 (3 colors $\times$ 4 bases).
  • Figure 3: (a) Layer fidelity and (b) EPLG versus chain length on several selected IBM hardware. The list of RB layers is lengths= $[2,4,8,16,30,50,70,100,150,200,300,500]$ and we use 10 random RB instances per depth and 1000 shots per-circuit.
  • Figure 4: Mirror circuit benchmarking on ibm_fez. Panel (a) shows a noisy simulation that uses realistic device-calibrated gate and readout error parameters, and panel (b) shows experimental data from the same backend. Each square represents a circuit shape with width (horizontal axis) and number of layers (vertical axis); color encodes the polarization $P=(S-2^{-w})/(1-2^{-w})$. The color bar is identical for both panels, enabling direct comparison between simulated and measured performance. Data were taken in December 2025.
  • Figure 5: QML kernel accuracy for a range of qubit counts on different hardware platforms. The number of shots is 1000. The figure distinguishes between circuits compiled normally and those executed with verbatim compilation on AWS Braket due to missing barrier support at the time of data collection (see Section \ref{['sec:benchmarking-compilation']} for details).
  • ...and 2 more figures