Table of Contents
Fetching ...

High-Speed VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography

Weihang Tan, Antian Wang, Yingjie Lao, Xinmiao Zhang, Keshab K. Parhi

TL;DR

This work addresses the computational bottleneck of modular polynomial multiplication in lattice-based PQC and homomorphic encryption by introducing a fast-filter, weight-stationary architecture that maps the operation to a systolic array. It develops a transpose-form FIR-inspired modular multiplier and extends it to fast M-parallel designs (2-, 3-, 4-, and generalized M-parallel) using polyphase decomposition and integrated modular reduction, achieving high throughput with full hardware utilization. Experimental results on FPGA show reductions in latency and area-time product compared with state-of-the-art Toom-Cook/Karatsuba-based designs, and a Saber-case study demonstrates significant latency improvements for the complete scheme. The proposed architecture provides scalable, hardware-efficient solutions for PQC deployments, enabling practical, low-latency implementations of Saber and similar lattice-based protocols.

Abstract

This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication. We also extend this structure to fast $M$-parallel architectures while achieving low-latency, high-speed, and full hardware utilization. We comprehensively evaluate the performance of the proposed architectures under various polynomial settings as well as in the Saber scheme for post-quantum cryptography as a case study. The experimental results show that our proposed modular polynomial multiplier reduces the computation time and area-time product, respectively, compared to the state-of-the-art designs.

High-Speed VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography

TL;DR

This work addresses the computational bottleneck of modular polynomial multiplication in lattice-based PQC and homomorphic encryption by introducing a fast-filter, weight-stationary architecture that maps the operation to a systolic array. It develops a transpose-form FIR-inspired modular multiplier and extends it to fast M-parallel designs (2-, 3-, 4-, and generalized M-parallel) using polyphase decomposition and integrated modular reduction, achieving high throughput with full hardware utilization. Experimental results on FPGA show reductions in latency and area-time product compared with state-of-the-art Toom-Cook/Karatsuba-based designs, and a Saber-case study demonstrates significant latency improvements for the complete scheme. The proposed architecture provides scalable, hardware-efficient solutions for PQC deployments, enabling practical, low-latency implementations of Saber and similar lattice-based protocols.

Abstract

This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication. We also extend this structure to fast -parallel architectures while achieving low-latency, high-speed, and full hardware utilization. We comprehensively evaluate the performance of the proposed architectures under various polynomial settings as well as in the Saber scheme for post-quantum cryptography as a case study. The experimental results show that our proposed modular polynomial multiplier reduces the computation time and area-time product, respectively, compared to the state-of-the-art designs.

Paper Structure

This paper contains 20 sections, 24 equations, 9 figures, 4 tables, 3 algorithms.

Figures (9)

  • Figure 1: DG of the modular polynomial multiplication when $n=4$. The DG is mapped to a systolic array using the projection vector shown in blue.
  • Figure 2: Three different forms of FIR filter architecture when $n=4$.
  • Figure 3: A degree-$4$ weight-stationary systolic modular polynomial multiplier.
  • Figure 4: Data-flow of the Fast.2.PolyMult algorithm.
  • Figure 5: Fast $2$-parallel modular polynomial multiplier.
  • ...and 4 more figures