Table of Contents
Fetching ...

Recent quantum runtime (dis)advantages

J. Tuziemski, J. Pawłowski, P. Tarasiuk, Ł. Pawela, B. Gardas

TL;DR

The paper interrogates reported quantum runtime advantages on NISQ devices by enforcing end-to-end timing with overheads through the $TT_{\varepsilon}$ metric, arguing that previous analyses often misrepresent true runtimes. It re-evaluates three milestones—quantum annealing for approximate QUBO, gate-based restricted Simon’s problem, and the BF-DCQO hybrid—and finds no durable runtime advantage when proper timing and strong classical baselines are used. The authors advocate for careful reference selection, avoidance of cherry-picking, and consideration of hardware-specific overheads to avoid false claims of supremacy. Their conclusion is that credible, runtime-based quantum supremacy on current hardware remains elusive, requiring rigorous benchmarking and problem-class-appropriate comparisons.

Abstract

We (re)evaluate recent claims of quantum advantage in annealing- and gate-based algorithms, testing whether reported speedups survive rigorous end-to-end runtime definitions and comparison against strong classical baselines. Conventional analyses often omit substantial overhead (readout, transpilation, thermalization, etc.) yielding biased assessments. While excluding seemingly not important parts of the simulation may seem reasonable, on most current quantum hardware a clean separation between "pure compute" and "overhead" cannot be experimentally justified. This may distort "supremacy" results. In contrast, for most classical hardware total time $\approx$ compute $+$ a weakly varying constant leading to robust claims. We scrutinize two important milestones: (1) quantum annealing for approximate QUBO PRL 134, 160601 (2025) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.134.160601], which uses a sensible time-to-$ε$ metric but proxies runtime by the annealing time (non-measurable); (2) a restricted Simon's problem PRX 15, 021082 (2025) [https://journals.aps.org/prx/abstract/10.1103/PhysRevX.15.021082] , whose advantageous scaling in oracle calls is undisputed; yet, as we demonstrate, estimated runtime of the quantum experiment is $\sim 100 \times$ slower than a tuned classical baseline. Finally, we show that recently claimed "runtime advantage" of the BF-DCQO hybrid algorithm (arXiv:2505.08663) does not withstand rigorous benchmarking. Therefore, we conclude that runtime-based supremacy remains elusive on NISQ hardware, and credible claims require a careful time accounting with a proper reference selections, and an adequate metric.

Recent quantum runtime (dis)advantages

TL;DR

The paper interrogates reported quantum runtime advantages on NISQ devices by enforcing end-to-end timing with overheads through the metric, arguing that previous analyses often misrepresent true runtimes. It re-evaluates three milestones—quantum annealing for approximate QUBO, gate-based restricted Simon’s problem, and the BF-DCQO hybrid—and finds no durable runtime advantage when proper timing and strong classical baselines are used. The authors advocate for careful reference selection, avoidance of cherry-picking, and consideration of hardware-specific overheads to avoid false claims of supremacy. Their conclusion is that credible, runtime-based quantum supremacy on current hardware remains elusive, requiring rigorous benchmarking and problem-class-appropriate comparisons.

Abstract

We (re)evaluate recent claims of quantum advantage in annealing- and gate-based algorithms, testing whether reported speedups survive rigorous end-to-end runtime definitions and comparison against strong classical baselines. Conventional analyses often omit substantial overhead (readout, transpilation, thermalization, etc.) yielding biased assessments. While excluding seemingly not important parts of the simulation may seem reasonable, on most current quantum hardware a clean separation between "pure compute" and "overhead" cannot be experimentally justified. This may distort "supremacy" results. In contrast, for most classical hardware total time compute a weakly varying constant leading to robust claims. We scrutinize two important milestones: (1) quantum annealing for approximate QUBO PRL 134, 160601 (2025) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.134.160601], which uses a sensible time-to- metric but proxies runtime by the annealing time (non-measurable); (2) a restricted Simon's problem PRX 15, 021082 (2025) [https://journals.aps.org/prx/abstract/10.1103/PhysRevX.15.021082] , whose advantageous scaling in oracle calls is undisputed; yet, as we demonstrate, estimated runtime of the quantum experiment is slower than a tuned classical baseline. Finally, we show that recently claimed "runtime advantage" of the BF-DCQO hybrid algorithm (arXiv:2505.08663) does not withstand rigorous benchmarking. Therefore, we conclude that runtime-based supremacy remains elusive on NISQ hardware, and credible claims require a careful time accounting with a proper reference selections, and an adequate metric.

Paper Structure

This paper contains 18 sections, 13 equations, 8 figures.

Figures (8)

  • Figure 1: Time-to-epsilon $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}}$ scaling with the size $N$ of the Sidon-28 instances, for values of $\varepsilon = 0.75\%$ in panel (a), and $\varepsilon = 1.25\%$ in panel (b). Results for QAC (blue) solver are reproduced from Ref. Lidar2025, courtesy of the authors. Remaining QAC data concerns experiments performed by us on the instances form Ref. Lidar2025. With the same definition of runtime the results (orange) are consistent with Lidar2025, what confirms correctness of methodology implementation. In addition $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}}$ defined using the complete QPU access time $t_f^{{\rm QPU \, access}}$ as reported by D-Wave's cloud interface (green), as well as runtime measured with cloud access $t_f^{{\rm runtime}}$ (red) are presented. Solid lines are power-law fits $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}} \propto N^{\alpha}$, with the corresponding exponents $\alpha$ shown on the plot. The instance size range spanned by the lines denote which data points were used for the respective fits. Both plots clearly demonstrate that with the proper runtime definition $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}}$ is constant for considered problem sizes. For comparison data for Simulated Bifurcation Machine (SBM) solverGoto2021Goto2016Goto2019PawlowskiClosingGap with $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}}$ computed using 1 GPU and total runtime $t_f^{\rm tot}$ (cyan), as well as pure GPU runtime $t_f^{\rm GPU}$ (magenta) are presented. In this case the scaling of $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}}$ is more robust with respect to different runtime definitions, which is generally the case for classical algorithms. The bottom panel (c) shows detailed data of the fitting exponent and its uncertainty for values of $\varepsilon \in \{0.5, 0.75, 1.00, 1.10, 1.25, 1.5\}\%$
  • Figure 2: For the restricted Simon's problem the quantum advantage manifests itself in a favorable polylogarithmic scaling of the protocol score function with the total number of periods $N_w = \sum_j \binom{N}{j}$ as compared to the exponential scaling for the classical algorithm. The score function depends on number of oracle queries and a probability of protocol success. (a) Total number of oracle queries for the protocol and (b) runtime as a function of total number of bits $N$, as obtained via execution of classical and quantum algorithm solving restricted Simon's problem with $w=7$. The shaded area indicates sizes of problems, for which the advantage was verified experimentally in LidarSimon. For the considered problem sizes and oracle periods the classical algorithm runtime is shorter than the quantum one, we predict that the runtime crossover occurs for problem sizes $N=60$. For example, in the case of $N=29$ bits, what corresponds to the implementation presented in Ref. LidarSimon, the runtime of the classical algorithm is two order of magnitude shorter than the quantum algorithm. The quantum implementation is based on oracle constructed in Ref. LidarSimon, and the number of shots required for the algorithm was established using Eq. (16) of Ref. LidarSimon. In this case the quantum circuit was transpiled to take into account connectivity and native gate set of IBM Brisbane, and runtime was estimated using Qiskit functionalities. The classical algorithm was implemented on a GPU. The fitting parameters of the score function for quantum and classical algorithm are presented in panels (c), (d). Parameter $\gamma=0$ indicates that the score scales polylogarithmicly, which implies quantum scaling advantage. The data for the quantum algorithm correspond to the results obtained for IBM Brisbane with dynamical decoupling - Table XX in LidarSimon.
  • Figure 3: Partial reproduction of Fig. 5 from Ref. ChandaranaKipuAdvantage, with additional results from SBM solver. Panel (a) shows approximation ratio $\cal R$ achieved in a time annotated on top of the bars, for instance of type $S_{2q} = 1$, $S_{3q} = 4$ and couplings from Cauchy distribution. The SA and BF-DCQO results from Ref. ChandaranaKipuAdvantage correspond to a single, best performing instance. Since the exact instances used by the authors of Ref. ChandaranaKipuAdvantage had not been disclosed, we generated our own instances and show SBM results for the best performing instance. This highlights the danger of cherry-picking results, and the simplicity of "manufacturing" supremacy claims. Finally, panel (b) shows the value of time-to-ratio $\mathrm{TT}_{\cal R}$ for instances of type $S_{2q} = 1$, $S_{3q} = 6$ and couplings from a symmetrized Pareto distribution, while the annotations indicate the target ratio $\cal R$. The results for CPLEX and BF-DCQO are again taken from Ref. ChandaranaKipuAdvantage, where they were obtained as a result of averaging over $5$ random instances. Similarly, we constructed our own instances and show SBM results averaged over $5$ of them. In both cases SBM outperforms other solvers, casting doubt on the supremacy claim of Ref. ChandaranaKipuAdvantage. Moreover, in App. \ref{['app:hubo']} we present a more thorough analysis of these instances, and show how to optimize HUBO SA to outperform BF-DCQO.
  • Figure 4: (a)-(b) Approximation ratio $\cal R$ achieved by D-Wave's Advantage2 1.6 quantum annealer on the same HUBO instances as in Fig. \ref{['fig:kipu_comp']}, computed for each instance type and size, by selecting the best result out of $5$ shots with $2^{10}$ samples each. Forward annealing scheme was used, with annealing times of $0.5\mu s$ (red), $10\mu s$ (green) and $100\mu s$ (blue). The instances were first reduced to QUBO form in the same way as in the case of SBM results (see App. \ref{['app:hubo']} for details), and then embedded into Advantage2 1.6 working graph. The results are, unsurprisingly, much worse than all other considered solvers. They can be explained by investigating the distribution of chain lengths in the necessary embedding, shown in panel (c). Since very long chains are needed to fit the problem instances onto the QPU, the performance of the quantum annealer is severely degraded by possible chain breaks. This highlights an issue that is often overlooked, yet crucial in the context of benchmarking solvers with hardware-imposed constraints on problem topology.
  • Figure 5: Illustration of impact of solver-related overhead on scaling behavior of $\left[\mathrm{TT}\varepsilon\right]_{\mathrm{Med}}$, as defined in Eq. \ref{['eq:TTe']}, using a certain type of Ising instances, relevant for near-term quantum devices (see Ref. PawlowskiClosingGap for details). Note, that the finite size effects diminish significantly beyond the gray-colored region ($N\gtrsim 2000$), as demonstrated with SBM results. However, such system sizes are currently beyond the capabilities of correct quantum annealing devices.
  • ...and 3 more figures