REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption
Aikata Aikata, Ahmet Can Mert, Sunmin Kwon, Maxim Deryabin, Sujoy Sinha Roy
TL;DR
Fully Homomorphic Encryption faces massive computation and memory overhead, challenging practical deployment with monolithic ASIC accelerators. REED proposes a scalable 4-chiplet 2.5D FHE accelerator using a ring-based non-blocking C2C interconnect, a Hybrid NTT, MAS/AUT blocks, and PRNG-based KeySwitch key generation to match large monolithic performance while improving yield and cost. It demonstrates encrypted DNN training benchmarks and reports up to $2{,}991\times$ CPU speedups and $1.9\times$ better performance with roughly half the development cost versus state-of-the-art monolithic designs, thanks to high off-chip bandwidth via HBMs and efficient chiplet collaboration. The work shows that chiplet-based FHE accelerators can make privacy-preserving ML broadly practical, with clear paths to higher throughput and 3D integration in future work.
Abstract
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they suffer from practical problems associated with large-scale chip design, such as inflexibility, low yield, and high manufacturing cost. In this paper, we present the first-of-its-kind multi-chiplet-based FHE accelerator `REED' for overcoming the limitations of prior monolithic designs. To utilize the advantages of multi-chiplet structures while matching the performance of larger monolithic systems, we propose and implement several novel strategies in the context of FHE. These include a scalable chiplet design approach, an effective framework for workload distribution, a custom inter-chiplet communication strategy, and advanced pipelined Number Theoretic Transform and automorphism design to enhance performance. Experimental results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$^2$ chip area, 49.4 W average power in 7nm technology. It could achieve a remarkable speedup of up to 2,991x compared to a CPU (24-core 2xIntel X5690) and offer 1.9x better performance, along with a 50% reduction in development costs when compared to state-of-the-art ASIC FHE accelerators. Furthermore, our work presents the first instance of benchmarking an encrypted deep neural network (DNN) training. Overall, the REED architecture design offers a highly effective solution for accelerating FHE, thereby significantly advancing the practicality and deployability of FHE in real-world applications.
