High-Speed VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography
Weihang Tan, Antian Wang, Yingjie Lao, Xinmiao Zhang, Keshab K. Parhi
TL;DR
This work addresses the computational bottleneck of modular polynomial multiplication in lattice-based PQC and homomorphic encryption by introducing a fast-filter, weight-stationary architecture that maps the operation to a systolic array. It develops a transpose-form FIR-inspired modular multiplier and extends it to fast M-parallel designs (2-, 3-, 4-, and generalized M-parallel) using polyphase decomposition and integrated modular reduction, achieving high throughput with full hardware utilization. Experimental results on FPGA show reductions in latency and area-time product compared with state-of-the-art Toom-Cook/Karatsuba-based designs, and a Saber-case study demonstrates significant latency improvements for the complete scheme. The proposed architecture provides scalable, hardware-efficient solutions for PQC deployments, enabling practical, low-latency implementations of Saber and similar lattice-based protocols.
Abstract
This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication. We also extend this structure to fast $M$-parallel architectures while achieving low-latency, high-speed, and full hardware utilization. We comprehensively evaluate the performance of the proposed architectures under various polynomial settings as well as in the Saber scheme for post-quantum cryptography as a case study. The experimental results show that our proposed modular polynomial multiplier reduces the computation time and area-time product, respectively, compared to the state-of-the-art designs.
