Table of Contents
Fetching ...

The Fast for the Curious: How to accelerate fault-tolerant quantum applications

Sam McArdle, Alexander M. Dalzell, Aleksander Kubica, Fernando G. S. L. Brandão

TL;DR

The paper addresses the challenge of making fault-tolerant quantum computations practically useful by reducing wall times through co-design of hardware, fault tolerance, and algorithmic subroutines. It surveys paradigms from standard 2D surface codes to 3D codes and neutral atoms, and introduces a simplified cost model to compare end-to-end resource tradeoffs, illustrated with Fermi-Hubbard simulations. It highlights time-optimal and algorithmic-level parallelism strategies, including Pauli-based vs Pauli-free compilations, and evaluates parallelism versus qubit overhead using FH resource estimates. The findings emphasize that achieving practical runtimes requires targeting logical clock speeds with hardware aware compilation, MSF management, and potentially 3D transversal gates, offering guidance for future FTQC roadmaps with industry relevance.

Abstract

We evaluate strategies for reducing the run time of fault-tolerant quantum computations, targeting practical utility in scientific or industrial workflows. Delivering a technology with broad impact requires scaling devices, while also maintaining acceptable run times for computations. Optimizing logical clock speed may require moving beyond current strategies, and adopting methods that trade faster run time for increased qubit counts or engineering complexity. We discuss how the co-design of hardware, fault tolerance, and algorithmic subroutines can reduce run times. We illustrate a selection of these topics with resource estimates for simulating the Fermi-Hubbard model.

The Fast for the Curious: How to accelerate fault-tolerant quantum applications

TL;DR

The paper addresses the challenge of making fault-tolerant quantum computations practically useful by reducing wall times through co-design of hardware, fault tolerance, and algorithmic subroutines. It surveys paradigms from standard 2D surface codes to 3D codes and neutral atoms, and introduces a simplified cost model to compare end-to-end resource tradeoffs, illustrated with Fermi-Hubbard simulations. It highlights time-optimal and algorithmic-level parallelism strategies, including Pauli-based vs Pauli-free compilations, and evaluates parallelism versus qubit overhead using FH resource estimates. The findings emphasize that achieving practical runtimes requires targeting logical clock speeds with hardware aware compilation, MSF management, and potentially 3D transversal gates, offering guidance for future FTQC roadmaps with industry relevance.

Abstract

We evaluate strategies for reducing the run time of fault-tolerant quantum computations, targeting practical utility in scientific or industrial workflows. Delivering a technology with broad impact requires scaling devices, while also maintaining acceptable run times for computations. Optimizing logical clock speed may require moving beyond current strategies, and adopting methods that trade faster run time for increased qubit counts or engineering complexity. We discuss how the co-design of hardware, fault tolerance, and algorithmic subroutines can reduce run times. We illustrate a selection of these topics with resource estimates for simulating the Fermi-Hubbard model.

Paper Structure

This paper contains 42 sections, 26 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: An abstract illustration of the compilation stack from a quantum algorithm (e.g., Shor's algorithm) to its physical implementation. The stack is an artificial division of choices that has developed over time to help explore the space of all possible designs. Compilation can take place within layers (e.g., trading coherent circuit depth for algorithm repetitions) or between layers (e.g., trading qubits to reduce $T$-count in QROAM, see Sec. \ref{['Subsec:LookupTablesQROM']}). The connections between layers signify that information must be passed between layers in order to optimize the compilation. For example, optimizing the $T$-depth in the logical layer depends on the spacetime cost of preparing magic states in the physical layer. The dependencies between layers means that compilation is an iterative, rather than one-way process.
  • Figure 2: A comparison between timescales in classical processors (yellow) and the anticipated timescales of quantum processors (purple), based on currently achievable benchmarks in superconducting platforms google2025Below designed for compatibility with the 2D surface code. Classical gate time corresponds to the delay for a fan-out of 4 (FO4) CMOS inverter gate, which acts as a standard benchmark---the reported value is taken from simulations of the 7nm process stillmaker2017scalingEquationsCMOS. Logical quantum gate time estimate assumes $d=25$ rounds of syndrome extraction (for lattice surgery), with each round taking 1.1µ s, consistent with the cycle time reported in Ref. google2025Below.
  • Figure 3: An illustration of the reaction time delay for a surface code computation. Time flows from left to right. Blue/red squares denote syndrome extraction. The physical qubits encoding the lower logical qubit are measured (green stars). The physical measurement outcomes are communicated to a classical processor for decoding, and decoded (using the syndrome history) to infer the logical measurement outcome. The logical measurement outcome determines the operation to be applied to the upper logical qubit. The instructions are then communicated back to the quantum processor, and implemented (purple circles). While the upper logical qubit waits for its next logical operation to be determined, syndrome extraction is carried out as usual.
  • Figure 4: Circuit diagrams and logical timesteps for a circuit compiled to Clifford + $T$ (left), and to Pauli-based computation (PBC, right). In Clifford + $T$ compilation, logical Clifford gates are explicitly implemented via lattice surgery or transversal gates, and logical operations may be implemented in parallel if sufficient magic states and routing space are available. In PBC, Clifford gates are commuted through the circuit, leaving a sequence of non-Clifford Pauli-product-rotations, which can be implemented by performing (via lattice surgery) a logical Pauli measurement of the corresponding qubits and a magic state. The Clifford frame $C$ is tracked, and can be commuted through subsequent gates. This example highlights some of the benefits and drawbacks of PBC; Clifford gates do not have to be explicitly implemented, lowering the gate count of the circuit. However, the $T$ gates that were parallel in the original circuit are implemented sequentially after compiling to PBC. In the diagrams showing logical timesteps, time flows upwards. Logical data qubits are depicted by blue squares, magic state factories are light green regions (production of high fidelity magic states may require additional unshown patches), magic states $\ket{T} = T\ket{+}$ are dark green squares, idle routing space is denoted by white squares, and routing space that is used in each time step by lattice surgery is depicted as gray squares. Green arrows indicate qubits that are involved in a logical Pauli measurement performed via lattice surgery. 'H' and 'S' are used to denote latice surgery implementations of the Hadamard and $S$ gates, respectively.
  • Figure 5: Selection of currently achievable time scales relevant for superconducting quantum processors in the 'standard 2D' FTQC paradigm. Physical quantum gate and measurement times are taken from the 105-qubit processor reported in google2025Below, which was used to demonstrate elements of QEC in the standard 2D paradigm for surface codes up to $d=7$. Logical gate times assume a lattice surgery approach requiring $d$ SE rounds (1.1µ s per SE round), with an extra factor of 2 for non-Clifford gates to perform the conditional correction. Magic state production time varies widely with the chosen MSF strategy, the target output error rate, and the physical error rate. At $p=0.1\%$ physical error rate, one MSF can produce one $\ket{T}$ magic state with output error probability $2.7 \times 10^{-12}$ in 83 SE rounds litinski2019magicstate, suitable for some applications listed in \ref{['tab:ResourceEstimates']}.
  • ...and 11 more figures