Table of Contents
Fetching ...

ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM

Jonathan Ku, Junyao Zhang, Haoxuan Shan, Saichand Samudrala, Jiawen Wu, Qilin Zheng, Ziru Li, JV Rajendran, Yiran Chen

TL;DR

This work tackles the bottleneck of large-number modular multiplication in ECC and ZKP by proposing a co-design of algorithms and SRAM-based PIM. It introduces R4CSA-LUT, a radix-4 carry-save addition–based interleaved modular multiplier with LUT precomputation, and ModSRAM, an SRAM PIM architecture that executes the algorithm with in-memory logic (XOR3/MAJ) and near-memory support. The design achieves efficient 256-bit modular multiplication in memory, with circuit-level results showing notable cycle reduction and a modest area overhead in a $65$nm process. This approach enables practical ECC/ZKP acceleration in memory, reducing data movement and boosting throughput for cryptographic workloads.

Abstract

Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Processing-in-memory (PIM) is a promising approach to tackle this problem. In this work, we design ModSRAM, the first 8T SRAM PIM architecture to compute large-number modular multiplication efficiently. In addition, we propose R4CSA-LUT, a new algorithm that reduces the cycles for an interleaved algorithm and eliminates carry propagation for addition based on look-up tables (LUT). ModSRAM is co-designed with R4CSA-LUT to support modular multiplication and data reuse in memory with 52% cycle reduction compared to prior works with only 32% area overhead.

ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM

TL;DR

This work tackles the bottleneck of large-number modular multiplication in ECC and ZKP by proposing a co-design of algorithms and SRAM-based PIM. It introduces R4CSA-LUT, a radix-4 carry-save addition–based interleaved modular multiplier with LUT precomputation, and ModSRAM, an SRAM PIM architecture that executes the algorithm with in-memory logic (XOR3/MAJ) and near-memory support. The design achieves efficient 256-bit modular multiplication in memory, with circuit-level results showing notable cycle reduction and a modest area overhead in a nm process. This approach enables practical ECC/ZKP acceleration in memory, reducing data movement and boosting throughput for cryptographic workloads.

Abstract

Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Processing-in-memory (PIM) is a promising approach to tackle this problem. In this work, we design ModSRAM, the first 8T SRAM PIM architecture to compute large-number modular multiplication efficiently. In addition, we propose R4CSA-LUT, a new algorithm that reduces the cycles for an interleaved algorithm and eliminates carry propagation for addition based on look-up tables (LUT). ModSRAM is co-designed with R4CSA-LUT to support modular multiplication and data reuse in memory with 52% cycle reduction compared to prior works with only 32% area overhead.
Paper Structure (19 sections, 6 figures, 3 tables, 3 algorithms)

This paper contains 19 sections, 6 figures, 3 tables, 3 algorithms.

Figures (6)

  • Figure 1: Algorithm complexity and performance comparison with previous work.
  • Figure 2: A 5-bit illustration of the first iteration in R4CSA-LUT dataflow with proposed ModSRAM.
  • Figure 3: The overall architecture of ModSRAM.
  • Figure 4: Area breakdown on ModSRAM and full custom layout for SRAM array and in-memory circuit.
  • Figure 5: Comparison of data organization for different SRAM PIM designs for modular multiplication.
  • ...and 1 more figures