if-ZKP: Intel FPGA-Based Acceleration of Zero Knowledge Proofs
Shahzad Ahmad Butt, Benjamin Reynolds, Veeraraghavan Ramamurthy, Xiao Xiao, Pohrong Chu, Setareh Sharifian, Sergey Gribok, Bogdan Pasca
TL;DR
This work targets the computational bottleneck of zk-SNARK provers by accelerating Multi-Scalar Multiplication (MSM) on FPGAs. It introduces a scalable SAB architecture and a high-performance Point Processor that supports both PA and PD operations, including a Unified Double-Add (UDA) design and a non-Montgomery form for large curves like BLS12-381. The approach achieves 110x–150x speedups over software and competitive performance versus GPU, while offering favorable power efficiency, enabling practical zk-SNARK proof generation in blockchain contexts. The results demonstrate first FPGA acceleration of BLS12-381 on Intel Agilex and establish a path toward broader hardware acceleration of zk-SNARKs, including NTT/IN TT and G2 MSM in future work.
Abstract
Zero-Knowledge Proofs (ZKPs) have emerged as an important cryptographic technique allowing one party (prover) to prove the correctness of a statement to some other party (verifier) and nothing else. ZKPs give rise to user's privacy in many applications such as blockchains, digital voting, and machine learning. Traditionally, ZKPs suffered from poor scalability but recently, a sub-class of ZKPs known as Zero-knowledge Succinct Non-interactive ARgument of Knowledges (zk-SNARKs) have addressed this challenge. They are getting significant attention and are being implemented by many public libraries. In this paper, we present a novel scalable architecture that is suitable for accelerating the zk-SNARK prover compute on FPGAs. We focus on the multi-scalar multiplication (MSM) that accounts for the majority of computation time spent in zk-SNARK systems. The MSM calculations extensive rely on modular arithmetic so highly optimized Intel IP Libraries for modular arithmetic are used. The proposed architecture exploits the parallelism inherent to MSM and is implemented using the Intel OneAPI framework for FPGAs. Our implementation runs 110x-150x faster compared to reference software library, uses a generic curve form in Jacobian coordinates and is the first to report FPGA hardware acceleration results for BLS12-381 and BN128 family of elliptic curves.
