ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM
Jonathan Ku, Junyao Zhang, Haoxuan Shan, Saichand Samudrala, Jiawen Wu, Qilin Zheng, Ziru Li, JV Rajendran, Yiran Chen
TL;DR
This work tackles the bottleneck of large-number modular multiplication in ECC and ZKP by proposing a co-design of algorithms and SRAM-based PIM. It introduces R4CSA-LUT, a radix-4 carry-save addition–based interleaved modular multiplier with LUT precomputation, and ModSRAM, an SRAM PIM architecture that executes the algorithm with in-memory logic (XOR3/MAJ) and near-memory support. The design achieves efficient 256-bit modular multiplication in memory, with circuit-level results showing notable cycle reduction and a modest area overhead in a $65$nm process. This approach enables practical ECC/ZKP acceleration in memory, reducing data movement and boosting throughput for cryptographic workloads.
Abstract
Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Processing-in-memory (PIM) is a promising approach to tackle this problem. In this work, we design ModSRAM, the first 8T SRAM PIM architecture to compute large-number modular multiplication efficiently. In addition, we propose R4CSA-LUT, a new algorithm that reduces the cycles for an interleaved algorithm and eliminates carry propagation for addition based on look-up tables (LUT). ModSRAM is co-designed with R4CSA-LUT to support modular multiplication and data reuse in memory with 52% cycle reduction compared to prior works with only 32% area overhead.
