Non-Binary LDPC Arithmetic Error Correction For Processing-in-Memory
Daijing Shi, Yihang Zhu, Anjunyi Fan, Yaoyu Tao, Yuchao Yang, Bonan Yan
TL;DR
This work tackles reliability challenges in Processing-in-Memory by introducing a unified NB-LDPC ECC operating over GF($p$) that supports long-codewords (up to $1024$ bits) and multi-bit error correction without interrupting PIM dataflow. The NB-LDPC framework uses generator and check matrices ($\mathbf{H}_G$, $\mathbf{H}_C$) to enable both memory-mode detection and PIM-mode correction, with a decoder structure built from variable and check nodes and a three-stage process: LLV initialization, forward-backward propagation, and accumulative correction. Key results include a silicon-proven 40nm prototype integrating an RRAM PIM core and NB-LDPC decoder, achieving up to $2.978\times$ ECC power efficiency improvement and up to $59.65\times$ BER improvement at $1024$-bit length and $80\%$ code rate, as well as sustaining high code rates ($>88\%$) with multi-bit error correction. The approach demonstrates practical impact by enabling reliable, energy-efficient PIM operations for memory-centric AI workloads and broad compatibility with multi-level memory schemes and weight-mapping techniques.
Abstract
Processing-in-memory (PIM) based on emerging devices such as memristors is more vulnerable to noise than traditional memories, due to the physical non-idealities and complex operations in analog domains. To ensure high reliability, efficient error-correcting code (ECC) is highly desired. However, state-of-the-art ECC schemes for PIM suffer drawbacks including dataflow interruptions, low code rates, and limited error correction patterns. In this work, we propose non-binary low-density parity-check (NB-LDPC) error correction running over the Galois field. Such NB-LDPC scheme with a long word length of 1024 bits can correct up to 8-bit errors with a code rate over 88%. Nonbinary GF operations can support both memory mode and PIM mode even with multi-level memory cells. We fabricate a 40nm prototype PIM chip equipped with our proposed NB-LDPC scheme for validation purposes. Experiments show that PIM with NB-LDPC error correction demonstrates up to 59.65 times bit error rate (BER) improvement over the original PIM without such error correction. The test chip delivers 2.978 times power efficiency enhancement over prior works.
