Table of Contents
Fetching ...

Impacts of Decoder Latency on Utility-Scale Quantum Computer Architectures

Abdullah Khalid, Allyson Silva, Gebremedhin A. Dagnew, Tom Dvir, Oded Wertheim, Motty Gruda, Xiangzhou Kong, Mia Kramer, Zak Webb, Artur Scherer, Masoud Mohseni, Yonatan Cohen, Pooya Ronagh

TL;DR

This work addresses the bottleneck of reaction time in fault-tolerant quantum computing by linking decoder and communication latencies to the performance of a surface-code-based architecture. It introduces a dual-reaction-time model, γ_LS and γ_mem, showing that correction-qubit decoding imposes a memory-latency bottleneck that governs the achievable circuit throughput, even when lattice-surgery decoding can be parallelized. By developing logical-error-rate models for the post-corrected π/8 gadget and fitting lattice-surgery error parameters, the authors translate reaction-time effects into full-system resource estimates, revealing substantial physical-qubit overheads and runtime penalties for utility-scale circuits unless decoders and communications scale dramatically. The results underscore the need for faster decoders, higher-bandwidth interconnects, and potentially alternative codes or more efficient magic-state distillation to realize practical FTQC, with concrete implications for the required decoder counts (on the order of ~15k for a 10M-qubit QPU) and the space-time Pareto frontier of core processor, MSF, and correction-storage regions.

Abstract

The speed of a fault-tolerant quantum computer is dictated by the reaction time of its classical electronics, that is, the total time required by decoders and controllers to determine the outcome of a logical measurement and execute subsequent conditional logical operations. Despite its importance, the reaction time and its impact on the design of the logical microarchitecture of a quantum computer are not well understood. In this work, we build, for a surface code based architecture, a model for the reaction time in which the decoder latency is based on parallel space- and time-window decoding methods, and communication latencies are drawn from our envisioned quantum execution environment comprising a high-speed network of quantum processing units, controllers, decoders, and high-performance computing nodes. We use this model to estimate the increase in the logical error rate of magic state injections as a function of the reaction time. Next, we show how the logical microarchitecture can be optimized with respect to the reaction time, and then present detailed full-system quantum and classical resource estimates for executing utility-scale quantum circuits based on realistic hardware noise parameters and state-of-the-art decoding times. For circuits with $10^{6}$--$10^{11}$ $T$ gates involving 200--2000 logical qubits, under a $Λ=9.3$ hardware model representative of a realistic target for superconducting quantum processors operating at a 2.86 MHz stabilization frequency, we show that even decoding at a sub-microsecond per stabilization round speed introduces substantial resource overheads: approximately 100k--250k additional physical qubits for correction qubit storage in the magic state factory; 300k--1.75M extra physical qubits in the core processor due to the code distance increase of $d$ to $d+4$ for extra memory protection; and a longer runtime by roughly a factor of 100.

Impacts of Decoder Latency on Utility-Scale Quantum Computer Architectures

TL;DR

This work addresses the bottleneck of reaction time in fault-tolerant quantum computing by linking decoder and communication latencies to the performance of a surface-code-based architecture. It introduces a dual-reaction-time model, γ_LS and γ_mem, showing that correction-qubit decoding imposes a memory-latency bottleneck that governs the achievable circuit throughput, even when lattice-surgery decoding can be parallelized. By developing logical-error-rate models for the post-corrected π/8 gadget and fitting lattice-surgery error parameters, the authors translate reaction-time effects into full-system resource estimates, revealing substantial physical-qubit overheads and runtime penalties for utility-scale circuits unless decoders and communications scale dramatically. The results underscore the need for faster decoders, higher-bandwidth interconnects, and potentially alternative codes or more efficient magic-state distillation to realize practical FTQC, with concrete implications for the required decoder counts (on the order of ~15k for a 10M-qubit QPU) and the space-time Pareto frontier of core processor, MSF, and correction-storage regions.

Abstract

The speed of a fault-tolerant quantum computer is dictated by the reaction time of its classical electronics, that is, the total time required by decoders and controllers to determine the outcome of a logical measurement and execute subsequent conditional logical operations. Despite its importance, the reaction time and its impact on the design of the logical microarchitecture of a quantum computer are not well understood. In this work, we build, for a surface code based architecture, a model for the reaction time in which the decoder latency is based on parallel space- and time-window decoding methods, and communication latencies are drawn from our envisioned quantum execution environment comprising a high-speed network of quantum processing units, controllers, decoders, and high-performance computing nodes. We use this model to estimate the increase in the logical error rate of magic state injections as a function of the reaction time. Next, we show how the logical microarchitecture can be optimized with respect to the reaction time, and then present detailed full-system quantum and classical resource estimates for executing utility-scale quantum circuits based on realistic hardware noise parameters and state-of-the-art decoding times. For circuits with -- gates involving 200--2000 logical qubits, under a hardware model representative of a realistic target for superconducting quantum processors operating at a 2.86 MHz stabilization frequency, we show that even decoding at a sub-microsecond per stabilization round speed introduces substantial resource overheads: approximately 100k--250k additional physical qubits for correction qubit storage in the magic state factory; 300k--1.75M extra physical qubits in the core processor due to the code distance increase of to for extra memory protection; and a longer runtime by roughly a factor of 100.

Paper Structure

This paper contains 17 sections, 22 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Target logical microarchitecture with a core processor sized for an algorithm requiring 36 logical qubits, coupled to an MSF with two distillation levels, each containing parallel (three lower-level and two higher-level) distillation units designed to implement the 15-to-1 magic state distillation protocol Litinski2019Game. Dedicated regions (the correction areas) are allocated for preparing and storing the correction qubits required for the post-corrected $\pi/8$ rotation gadget. The number of correction qubit storage patches required per distillation unit is determined by the reaction time expressed in logical cycles, i.e., $\lceil{\gamma_\text{mem} / \tau_\text{logical}\rceil}$, where the logical cycle time $\tau_\text{logical}$ is set by the code distance used in the corresponding distillation unit. Our assembly onto this microarchitecture determines the code distances for the logical patches, which can be different for each distillation level and for the core processor; the number of parallel distillation units required to achieve just-in-time resource delivery while accounting for classical decoding latency; and the storage capacity required for correction buffers. Further details are provided in Ref. Silva2025Optimizing.
  • Figure 2: Schematic of the functional decomposition of the quantum execution environment. Arrows indicate the data transfer directions, and are labelled by the type of data and/or the corresponding communication times.
  • Figure 3: Implementation of a sequence of post-corrected $\pi/8$ Pauli rotations on a QPU. Each $\pi/8$ rotation gadget (indicated by a grey dashed line box) requires performing measurements involving logical data qubits in the joint state $|\psi\rangle$, a high-fidelity magic state $|m\rangle$ distilled in an MSF (not shown), and a correction qubit prepared in the state $|0\rangle$. The correction qubit in each rotation gadget is measured in either the $X$- or $Z$-basis, depending on the outcome of the logical measurements and on the outcome of the measurement of the previous correction qubit. In the classical processor (not shown), the received syndrome data is used to decode the outcomes of the measurements. The lattice surgery measurements involved in implementing each $\pi/8$ rotation are spatially large and have a reaction time of $\gamma_\text{LS}$. The correction qubits cannot be measured until their associated lattice surgery measurements have been decoded, so they are stored in memory. Hence, the decoding of each correction qubit measurement has a reaction time of $\gamma_\text{mem}$. The decoding of the logical measurements can be parallelized, but the decoding of the correction qubit measurements cannot.
  • Figure 4: Logical error rates of lattice surgery as a function of the distance $d$ (left figure), and as a function of the number of rounds $r$ (right figure). The simulations are performed using the error model from Ref. Mohseni2025How in conjunction with the Target hardware parameter set specified in \ref{['tab:target-params']}. Data points are shown with error bars that have a range too small to be discernible. The data is fit to \ref{['eq:ler_ls_distance']} and \ref{['eq:ler_ls_rounds']}, respectively, and extrapolated to large distances. The uncertainties in the fits, depicted by the shaded regions, is small at the distances used in the assembled microarchitectures.
  • Figure 5: Physical qubit footprint versus reaction time, measured in logical cycles of the core processor, broken down by architectural area. Estimates are provided for two contrasting utility-scale quantum algorithms: (a) ground-state energy estimation targeting a precision of 1.6 mHa for a 2D Fermi--Hubbard model on a $32 \times 32$ lattice, using a circuit generated using Campbell's Plaquette Trotterization approach campbell_early_2022, which requires 2562 logical qubits and $4 \times 10^6$$T$ gates; and (b) a single-shot dynamic-circuit implementation of the quantum eigenvalue transform for NMR spectral prediction of the $\alpha$-conotoxin macromolecule elenewski2024prospectsnmrspectralprediction, requiring 241 logical qubits and $5.11 \times 10^{11}$$T$ gates. For each algorithm, the figure shows the architecture chosen on the time-optimal branch of the space--time Pareto frontier generated using the TopQAD software suite 1qbit2024topqad.
  • ...and 2 more figures