Table of Contents
Fetching ...

Local Clustering Decoder as a fast and adaptive hardware decoder for the surface code

Abbas B. Ziad, Ankit Zalawadiya, Canberk Topal, Joan Camps, György P. Gehér, Matthew P. Stafford, Mark L. Turner

TL;DR

The paper addresses the need for fault-tolerant quantum computation to be supported by decoders that are both highly accurate and capable of real-time operation. It introduces the Local Clustering Decoder (LCD), an FPGA-based, coarse-grained, distributed UF decoder with an adaptivity engine that updates the decoding graph in real time to handle leakage. LCD demonstrates substantial improvements in logical accuracy and hardware efficiency under circuit-level noise with leakage, achieving decoding in under $1~\mu$s per round up to $d=17$ and enabling large qubit savings (e.g., reducing required code distance from $d=33$ to $d=17$ for $10^6$ operations). The work shows that leakage-aware, adaptive hardware decoding is both feasible and impactful, suggesting a path toward scalable, low-overhead QEC implementations and potential ASIC realization.

Abstract

To avoid prohibitive overheads in performing fault-tolerant quantum computation, the decoding problem needs to be solved accurately and at speeds sufficient for fast feedback. Existing decoding systems fail to satisfy both of these requirements, meaning they either slow down the quantum computer or reduce the number of operations that can be performed before the quantum information is corrupted. We introduce the Local Clustering Decoder as a solution that simultaneously achieves the accuracy and speed requirements of a real-time decoding system. Our decoder is implemented on FPGAs and exploits hardware parallelism to keep pace with the fastest qubit types. Further, it comprises an adaptivity engine that allows the decoder to update itself in real-time in response to control signals, such as heralded leakage events. Under a realistic circuit-level noise model where leakage is a dominant error source, our decoder enables one million error-free quantum operations with 4x fewer physical qubits when compared to standard non-adaptive decoding. This is achieved whilst decoding in under 1 us per round with modest FPGA resources, demonstrating that high-accuracy real-time decoding is possible, and reducing the qubit counts required for large-scale fault-tolerant quantum computation.

Local Clustering Decoder as a fast and adaptive hardware decoder for the surface code

TL;DR

The paper addresses the need for fault-tolerant quantum computation to be supported by decoders that are both highly accurate and capable of real-time operation. It introduces the Local Clustering Decoder (LCD), an FPGA-based, coarse-grained, distributed UF decoder with an adaptivity engine that updates the decoding graph in real time to handle leakage. LCD demonstrates substantial improvements in logical accuracy and hardware efficiency under circuit-level noise with leakage, achieving decoding in under s per round up to and enabling large qubit savings (e.g., reducing required code distance from to for operations). The work shows that leakage-aware, adaptive hardware decoding is both feasible and impactful, suggesting a path toward scalable, low-overhead QEC implementations and potential ASIC realization.

Abstract

To avoid prohibitive overheads in performing fault-tolerant quantum computation, the decoding problem needs to be solved accurately and at speeds sufficient for fast feedback. Existing decoding systems fail to satisfy both of these requirements, meaning they either slow down the quantum computer or reduce the number of operations that can be performed before the quantum information is corrupted. We introduce the Local Clustering Decoder as a solution that simultaneously achieves the accuracy and speed requirements of a real-time decoding system. Our decoder is implemented on FPGAs and exploits hardware parallelism to keep pace with the fastest qubit types. Further, it comprises an adaptivity engine that allows the decoder to update itself in real-time in response to control signals, such as heralded leakage events. Under a realistic circuit-level noise model where leakage is a dominant error source, our decoder enables one million error-free quantum operations with 4x fewer physical qubits when compared to standard non-adaptive decoding. This is achieved whilst decoding in under 1 us per round with modest FPGA resources, demonstrating that high-accuracy real-time decoding is possible, and reducing the qubit counts required for large-scale fault-tolerant quantum computation.

Paper Structure

This paper contains 16 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Circuitry and decoding graphs.(a) A distance $5$ rotated planar surface code and its associated $Z$-type decoding graph. $X$ ($Z$) plaquettes are orange (blue). Light-blue vertices share an edge with a virtual boundary vertex (not shown). This edge flips a logical observable if the vertex is marked with a cross. (b) Circuitry implementing patch wiggling over two rounds of syndrome extraction. Notice that hook edges in the decoding graph reverse every round. This is due to reversing the scheduling of the stabilisers, which is inherently needed to achieve wiggling mcewen_relaxing_2023geher_error-corrected_2023. In red, we show the set of errors made more likely when the qubit labelled $q_1$ is measured as leaked at the end of the fourth round of syndrome extraction. These errors follow the 2-qubit gates involving $q_1$ depicted in the circuit.
  • Figure 2: Example decoding engine. (a) A single round of the $d = 5$ rotated planar surface code mapped onto a PE array. Each PE is assigned two vertices and linked to the PEs that contain their neighbours. Furthermore, each PE is assigned to a time slot in either the first or second part of a pair of conflict-free parts. The PEs in the same colour class are in the same time slot. The central controller coordinates the parts, which in turn coordinate the PEs. (b) The decoding engine extended in time to include $d + 1$ contiguously stacked layers, capturing graphlike error mechanisms across $d$ rounds of syndrome extraction. There are now four parts, each containing nine PEs. Note that the assignment of PEs to time slots is equivalent to a colouring of the PEs in the square of the array---the square of the array is constructed by adding links between any two PEs whose distance in the array is 2. The straight links connecting PEs in the same row/column support the spatial/timelike edges in the decoding graph and the diagonal links connecting PEs one/two columns apart support the short/long hook edges. Note that the diagonal links reverse direction every round. This is done to support patch wiggling, cf. Fig. \ref{['fig:wiggling']}. In general, the PE array is compiled from the decoding graph to support arbitrary shapes and sizes.
  • Figure 3: Performance of LCD on a Xilinx Virtex Ultrascale+ VU19P FPGA vu19p. Each experiment uses a distance $d$ rotated planar surface code with patch wiggling and is sampled and decoded 10 million times. (a) Logical error rate per round as a function of $d$. Each noise model: (LL) $p = 1\times10^{-3}$ and $p_l = 1\times10^{-4}$ (green); and (HL) $p = p_l = 5\times10^{-4}$ (blue), is decoded with non-adaptive (solid) and adaptive (dashed) decoding, resulting in different $\Lambda$. (b) Decoding time per round as function of $d$. The number of vertices per PE is $\lfloor {d/2} \rfloor$ and the operating frequency is 285MHz for all code distances, resulting in sublinear scaling with respect to $d$. We decode in under 1 $\mu$s per round (red line) on both noise models using non-adaptive or adaptive decoding up to at least distance $d=17$. (c) Log-log plot of the number of logic Look-Up-Tables (LUTs) and Flip-Flops (FFs) required by our decoder as a percentage of the total logic LUTs (4,085,760) and total FFs (8,171,520) available on the FPGA. At $d = 17$, we use around 6% and 3% of the available logic LUTs and FFs, respectively.
  • Figure 4: State machine
  • Figure 5: Pass through the finite state machine (FSM). Bulk vertices are green; boundary-adjacent blue. Red vertices are defects. Square vertices are active; circular inactive. Black-outlined vertices have odd parity. Labels inside vertices are cluster indices; labels above, vertex indices. The decoding graph is unweighted. If the radii of the vertices on the endpoints of an edge sum to 2 or the edge is a pre-grown edge, we say it is fully-grown and colour it pink. Parenthood relationships are represented with arrows. There are three odd clusters, two singleton clusters ($\{4\}$ and $\{5\}$) and one pre-cluster ($\{0, 2\}$) (a) Part 0 enters the growing stage. (b) Active vertices increase their radii by one. Since vertices 0, 2, 4 and 5 are active, this results in fully-grown edges $\{(2, 4), (2, 5)\}$. (c) Part 0 enters the merging stage. (d - f) Each vertex identifies the neighbour connected to it by a fully-grown edge with the lowest cluster index. Then, if the cluster index of the neighbour is less than that of the vertex, the vertex adopts it as its own and begins pointing at the neighbour. Lastly, child vertices with odd parity, flip the parities of their parents, and make their own even. In part (f), vertices 4 and 5 adopt cluster index 0 from vertex 2 and begin pointing at it. As such, the parities of vertices 2, 4 and 5 flip twice, once and once, respectively, making them all even. At the end of a merge sequence, all vertices, excluding root vertices, must have even parity. (g) Part 0 enters the picking stage. (h) Root vertices with odd parity become active. Since vertex 0 is the root of the odd cluster $\{0, 2, 4, 5\}$, it becomes active. The other vertices in the cluster, i.e., 2, 4 and 5, become inactive. (i) Part 0 enters the syncing stage. (j - l) Each vertex connected by a fully-grown edge to a neighbour that is active, becomes active. In part (k), vertex 2 becomes active through vertex 0. In part (l), vertices 4 and 5 become active through vertex 2.
  • ...and 1 more figures