CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure
Sangpyo Kim, Jongmin Kim, Jaeyoung Choi, Jung Ho Ahn
TL;DR
Fully homomorphic encryption offers privacy-preserving computation but incurs immense overhead. CiFHER tackles this with a resizable, chiplet-based MCM architecture featuring a composable NTT unit, generalized data mapping, and limb-duplication to minimize inter-chiplet communication and NoP bottlenecks, achieving competitive performance with significantly lower area and power than monolithic ASICs. The approach demonstrates robust performance across CKKS workloads (bootstrapping, CNN inference, sorting, and HELR) and reveals favorable cost-profile tradeoffs when using chiplets versus monolithic dies. Overall, CiFHER provides a scalable, cost-effective pathway toward practical, high-performance FHE accelerators in post-M Moore packaging ecosystems.
Abstract
Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible core architecture whose configuration is adjustable to conform to the global organization of chiplets and design constraints. Its distinctive feature is a composable functional unit providing varying computational throughput for the number-theoretic transform, the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the interconnect overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the packaging constraints. This study demonstrates that a CiFHER package composed of a number of compact chiplets provides performance comparable to state-of-the-art monolithic ASIC accelerators while significantly reducing the package-wide power consumption and manufacturing cost.
