A Unified Hardware Accelerator for Fast Fourier Transform and Number Theoretic Transform
Rishabh Shrivastava, Chaitanya Prasad Ratnala, Durga Manasa Puli, Utsav Banerjee
TL;DR
The paper introduces a unified hardware accelerator that simultaneously supports a $512$-point complex FFT and a $256$-point NTT to service both traditional digital signal processing and post-quantum lattice-based cryptography (ML-KEM and ML-DSA). By reusing FFT arithmetic and adding modular-reduction logic, the design achieves competitive performance with state-of-the-art NTT accelerators while incurring modest increases in LUTs ($\approx62\%$) and FFs ($\approx26\%$) and no changes to DSPs or BRAMs. Implemented on a Xilinx Zynq UltraScale+ FPGA at $400\,\text{MHz}$, it demonstrates that a single, carefully engineered butterfly unit can handle FFT and both NTT variants (Kyber and Dilithium) with flexible memory organization and unified control. This work enables hardware platforms used for DSP to be efficiently upgraded for PQC workloads, facilitating secure communications and cryptographic signatures with shared infrastructure. Future directions include extending the architecture to additional PQC NTT schemes and exploring ASIC implementations for even higher efficiency.
Abstract
The Number Theoretic Transform (NTT) is an indispensable tool for computing efficient polynomial multiplications in post-quantum lattice-based cryptography. It has strong resemblance with the Fast Fourier Transform (FFT), which is the most widely used algorithm in digital signal processing. In this work, we demonstrate a unified hardware accelerator supporting both 512-point complex FFT as well as 256-point NTT for the recently standardized NIST post-quantum key encapsulation and digital signature algorithms ML-KEM and ML-DSA respectively. Our proposed architecture effectively utilizes the arithmetic circuitry required for complex FFT, and the only additional circuits required are for modular reduction along with modifications in the control logic. Our implementation achieves performance comparable to state-of-the-art ML-KEM / ML-DSA NTT accelerators on FPGA, thus demonstrating how an FFT accelerator can be augmented to support NTT and the unified hardware can be used for both digital signal processing and post-quantum lattice-based cryptography applications.
