Table of Contents
Fetching ...

FPGA-tailored algorithms for real-time decoding of quantum LDPC codes

Satvik Maurya, Thilo Maurer, Markus Bühler, Drew Vandeth, Michael E. Beverland

TL;DR

Fault-tolerant quantum computing requires real-time classical decoders that can keep up with QEC cycles. This work benchmarks FPGA-tailored implementations of three qLDPC-decoder classes—Relay (message-passing), filtered-OSD, and cluster decoding—and introduces a rank-deficient systolic solver to enable efficient hardware execution. Across fixed FPGA budgets, Relay achieves the best logical-error rates, with filtered-OSD and clustering offering meaningful yet inferior improvements, indicating message-passing is the most viable path for real-time qLDPC decoding on FPGAs. The study also develops a generalized, liftable Gauss–Jordan solver for arbitrary binary matrices and demonstrates how these FPGA-oriented techniques can constrain latency tails while reducing resource footprints. Overall, the results guide practical FPGA-based decoder design for large-scale qLDPC codes in near-term fault-tolerant quantum architectures.

Abstract

Real-time decoding is crucial for fault-tolerant quantum computing but likely requires specialized hardware such as field-programmable gate arrays (FPGAs), whose parallelism can alter relative algorithmic performance. We analyze FPGA-tailored versions of three decoder classes for quantum low-density parity-check (qLDPC) codes: message passing, ordered statistics, and clustering. For message passing, we analyze the recently introduced Relay decoder and its FPGA implementation; for ordered statistics decoding (OSD), we introduce a filtered variant that concentrates computation on high-likelihood fault locations; and for clustering, we design an FPGA-adapted generalized union-find decoder. We design a systolic algorithm for Gaussian elimination on rank-deficient systems that runs in linear parallel time, enabling fast validity checks and local corrections in clustering and eliminating costly full-rank inversion in filtered-OSD. Despite these improvements, both remain far slower and less accurate than Relay, suggesting message passing is the most viable route to real-time qLDPC decoding.

FPGA-tailored algorithms for real-time decoding of quantum LDPC codes

TL;DR

Fault-tolerant quantum computing requires real-time classical decoders that can keep up with QEC cycles. This work benchmarks FPGA-tailored implementations of three qLDPC-decoder classes—Relay (message-passing), filtered-OSD, and cluster decoding—and introduces a rank-deficient systolic solver to enable efficient hardware execution. Across fixed FPGA budgets, Relay achieves the best logical-error rates, with filtered-OSD and clustering offering meaningful yet inferior improvements, indicating message-passing is the most viable path for real-time qLDPC decoding on FPGAs. The study also develops a generalized, liftable Gauss–Jordan solver for arbitrary binary matrices and demonstrates how these FPGA-oriented techniques can constrain latency tails while reducing resource footprints. Overall, the results guide practical FPGA-based decoder design for large-scale qLDPC codes in near-term fault-tolerant quantum architectures.

Abstract

Real-time decoding is crucial for fault-tolerant quantum computing but likely requires specialized hardware such as field-programmable gate arrays (FPGAs), whose parallelism can alter relative algorithmic performance. We analyze FPGA-tailored versions of three decoder classes for quantum low-density parity-check (qLDPC) codes: message passing, ordered statistics, and clustering. For message passing, we analyze the recently introduced Relay decoder and its FPGA implementation; for ordered statistics decoding (OSD), we introduce a filtered variant that concentrates computation on high-likelihood fault locations; and for clustering, we design an FPGA-adapted generalized union-find decoder. We design a systolic algorithm for Gaussian elimination on rank-deficient systems that runs in linear parallel time, enabling fast validity checks and local corrections in clustering and eliminating costly full-rank inversion in filtered-OSD. Despite these improvements, both remain far slower and less accurate than Relay, suggesting message passing is the most viable route to real-time qLDPC decoding.

Paper Structure

This paper contains 107 sections, 20 equations, 30 figures, 1 table, 2 algorithms.

Figures (30)

  • Figure 1: Classes of qLDPC decoder. Among these, classes (iv)–(vi) are identified as the most suitable for real-time decoding in FPGAs and are evaluated in this work.
  • Figure 2: Cutoff-time performance curves for FPGA-tailored decoders. Logical error rate versus FPGA cycle budget for Relay, filtered-OSD, cluster decoding, and standard OSD bascones2025exploring, with runs exceeding the cycle cutoff counted as failures. Relay achieves the lowest error rates across all tested budgets. Filtered-OSD starts up with significantly fewer cycles than standard OSD.
  • Figure 3: Distance-$6$ toric code on a square torus encoding $k=2$ logical qubits ($n=2d^2=72$ data qubits; black dots). Blue (green) faces denote $X$-type ($Z$-type) stabilizer generators, each acting on the four data qubits around a face. Representative $X$- and $Z$-type logical loops are highlighted. In the error-free case all stabilizer outcomes are $+1$; a single $Z$ ($X$) error flips the adjacent $X$-type ($Z$-type) checks (red). A standard circuit uses one ancilla per face: to measure $X$-checks, prepare $\ket{+}$, apply CNOTs with the ancilla as control to its neighbors, then measure in $X$; for $Z$-checks, prepare $\ket{0}$, use data qubits as controls into the ancilla, then measure in $Z$.
  • Figure 4: Tanner graphs for a simple noise model where the circuit consists of the measurement of each check of the code (toric code on the left and gross code on the right), and the set of faults consists only of an $X$ error on each qubit prior to the check measurements. Note that for the case of the gross code, we only draw the long-range edges (red) from one check, but all checks have similar edges which can be obtained by translation of those shown on the Torus.
  • Figure 5: Computational slowdown from decoding latency. Many approaches to universal fault-tolerant quantum computation implement non-Clifford gates by injecting a $T$ state, as shown. This introduces a classically controlled feed-forward step: the $SX$ correction depends on a logical measurement outcome determined by the decoder. As noted in Ref. terhal2015quantum, subsequent logical gates remain unspecified until this outcome is known, so decoding latency can slow the computation.
  • ...and 25 more figures

Theorems & Definitions (1)

  • Definition 1: Lifted (reduced) row-echelon matrix