Micro Blossom: Accelerated Minimum-Weight Perfect Matching Decoding for Quantum Error Correction
Yue Wu, Namitha Liyanage, Lin Zhong
TL;DR
Micro Blossom delivers the first publicly known exact MWPM decoder with sub-$\mu s$ latency for quantum error correction by partitioning the blossom algorithm across many fine-grained processing units on a programmable accelerator and a software primal phase. The architecture employs vertex- and edge-level parallelism, per-vertex local state, and round-wise fusion to achieve $O(|V|^3)$ worst-case and $O(p^2|V|^2)$ average latency improvements, demonstrated on an FPGA with $d=13$ and $p=0.1\%$ yielding $0.8\ \mu s$ average decoding latency. It also introduces practical resource-efficient variants and isolated-conflict handling to reduce CPU–accelerator interactions, delivering an 8x latency improvement over prior exact MWPM implementations. The work shows real-time, exact MWPM decoding is feasible for fault-tolerant quantum computation and provides an open-source artifact to enable further hardware-accelerated QEC research.
Abstract
Minimum-Weight Perfect Matching (MWPM) decoding is important to quantum error correction decoding because of its accuracy. However, many believe that it is difficult, if possible at all, to achieve the microsecond latency requirement posed by superconducting qubits. This work presents the first publicly known MWPM decoder, called Micro Blossom, that achieves sub-microsecond decoding latency. Micro Blossom employs a heterogeneous architecture that carefully partitions a state-of-the-art MWPM decoder between software and a programmable accelerator with parallel processing units, one of each vertex/edge of the decoding graph. On a surface code with code distance $d$ and a circuit-level noise model with physical error rate $p$, Micro Blossom's accelerator employs $O(d^3)$ parallel processing units to reduce the worst-case latency from $O(d^{12})$ to $O(d^9)$ and reduce the average latency from $O(p d^3+1)$ to $O(p^2 d^2+1)$ when $p \ll 1$. We report a prototype implementation of Micro Blossom using FPGA. Measured at $d=13$ and $p=0.1\%$, the prototype achieves an average decoding latency of $0.8 μs$ at a moderate clock frequency of $62 MHz$. Micro Blossom is the first publicly known hardware-accelerated exact MWPM decoder, and the decoding latency of $0.8 μs$ is 8 times shorter than the best latency of MWPM decoder implementations reported in the literature.
