Table of Contents
Fetching ...

Managing Classical Processing Requirements for Quantum Error Correction

Satvik Maurya, Abtin Molavi, Aws Albarghouthi, Swamit Tannu

TL;DR

This work proposes a two-level framework that treats decoders as shared accelerators managed by the quantum operating system, and reduces decoder requirements by 10-40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.

Abstract

Large-scale quantum computers promise transformative speedups, but their viability hinges on fast and reliable quantum error correction (QEC). At the center of QEC are decoders-classical algorithms running on hardware such as FPGAs, GPUs, or CPUs that process error syndromes to detect errors every microsecond to preserve fault-tolerance. Quantum processors, therefore, operate not in isolation, but as accelerators tightly coupled with powerful classical digital hardware. A key challenge is that decoder demand fluctuates unpredictably: bursts of activity can require orders of magnitude more decodes than idle periods. Provisioning hardware for the worst case wastes resources, while provisioning for the average case risks catastrophic slowdowns. We show that this mismatch is a systems problem of capacity planning and scheduling, and propose a two-level framework that treats decoders as shared accelerators managed by the quantum operating system. Our approach reduces decoder requirements by 10-40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.

Managing Classical Processing Requirements for Quantum Error Correction

TL;DR

This work proposes a two-level framework that treats decoders as shared accelerators managed by the quantum operating system, and reduces decoder requirements by 10-40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.

Abstract

Large-scale quantum computers promise transformative speedups, but their viability hinges on fast and reliable quantum error correction (QEC). At the center of QEC are decoders-classical algorithms running on hardware such as FPGAs, GPUs, or CPUs that process error syndromes to detect errors every microsecond to preserve fault-tolerance. Quantum processors, therefore, operate not in isolation, but as accelerators tightly coupled with powerful classical digital hardware. A key challenge is that decoder demand fluctuates unpredictably: bursts of activity can require orders of magnitude more decodes than idle periods. Provisioning hardware for the worst case wastes resources, while provisioning for the average case risks catastrophic slowdowns. We show that this mismatch is a systems problem of capacity planning and scheduling, and propose a two-level framework that treats decoders as shared accelerators managed by the quantum operating system. Our approach reduces decoder requirements by 10-40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.

Paper Structure

This paper contains 67 sections, 3 equations, 18 figures, 1 table.

Figures (18)

  • Figure 1: Decoder provisioning highlights the tension between worst-case (wasteful) and average-case provisioning (risky), motivating elastic decoder scheduling.
  • Figure 2: (a) A surface code logical qubit (patch); (b) QEC works by repeatedly performing operations followed by measurements to generate syndromes; (c) Decoding with measurement errors -- multiple rounds of measurements are decoded collectively; (d) The fundamental split and merge operations of lattice surgery used for logical computation: each square is a logical qubit shown in (a).
  • Figure 3: (a) Streams of syndromes are processed in windows represented by blocks which are fixed decoding volumes: each block consists of a region in which corrections are committed, while buffer regions are used for merging corrections with overlapping windows; (b) Sliding windowed decoding (SWD) is inherently sequential: window $W_{i+1}$ can only be processed after $W_i$ has been processed; Parallel windowed decoding (PWD) can be (c) temporal and/or (d) spatial: in either case, $W_{i}, W_{i+2}$ are treated as independent tasks and processed independently in parallel at time step $t_j$. $W_{i+1}$, which includes overlaps with $W_{i}, W_{i+2}$, is processed in the next time step.
  • Figure 4: (a) A full system using hardware (FPGA/GPU/ASIC) decoders: qubits are measured using the readout system, which produces syndromes that are buffered until a window has been collected. Then, the decoder uses these collected syndromes to determine a correction, which is communicated to software; (b) Effect of the normalized decoder latency $t_D$ on the slowdown in processing $5d$ rounds of syndromes using SWD. The buffer region size is the normalized size of the buffer regions required ($n_W=3d, n_{com}=d$ would yield an overlap of 0.66).
  • Figure 5: (a) Using simulations, effect of the normalized decoder latency $t_D$ on the slowdown in processing $5d$ rounds of syndromes using Temporally parallel windowed decoding (PWD); (b) Logical qubits arranged in the EDPC layout with ancilla bridges of varying sizes to facilitate LS operations; (c) PWD is applicable in both space and time -- this leads to an a variable and sudden increase in the number of decoders required for implementing PWD (assuming $t_W=3d$).
  • ...and 13 more figures