Highly Efficient Parallel Row-Layered Min-Sum MDPC Decoder for McEliece Cryptosystem
Jiaxuan Cai, Xinmiao Zhang
TL;DR
This work tackles memory and latency bottlenecks in Min-sum MDPC decoding for McEliece post-quantum cryptography by combining row-layered scheduling with finite-precision mitigation and a dynamic, L-parallel decoder architecture. It introduces construction constraints on the MDPC H matrix to enable efficient L×L parallelism while preserving decoding performance and cryptographic security, and a dynamic H-division scheme to realize near-linear speedups with small additional memory. The proposed 2-parallel design achieves about 26% memory reduction and around 70% latency reduction compared with prior Sliced-MP approaches, with minimal impact on FER for moderate L and strong resistance to reaction attacks. Overall, the paper demonstrates a practical path to high-parallel, memory-efficient MDPC decoders suitable for PQC standardization and real-world deployment.
Abstract
The medium-density parity-check (MDPC) code-based McEliece cryptosystem remains a finalist of the post-quantum cryptography standard. The Min-sum decoding algorithm achieves better performance-complexity tradeoff than other algorithms for MDPC codes. However, the prior Min-sum MDPC decoder requires large memories, whose complexity dominates the overall complexity. Besides, its actual achievable parallelism is limited. This paper has four contributions: For the first time, the row-layered scheduling scheme is exploited to substantially reduce the memory requirement of MDPC decoders; A low-complexity scheme is developed to mitigate the performance loss caused by finite precision representation of the messages and high column weights of MDPC codes in row-layered decoding; Constraints are added to the parity check matrix construction to enable effective parallel processing with negligible impacts on the decoder performance and resilience towards attacks; A novel parity check matrix division scheme for highly efficient parallel processing is proposed and the corresponding parallel row-layered decoder architecture is designed. The number of clock cycles for each decoding iteration is reduced by a factor of L using the proposed L-parallel decoder with very small memory overhead. For an example 2-parallel decoder, the proposed design leads to 26% less memory requirement and 70% latency reduction compared to the prior decoder.
