Table of Contents
Fetching ...

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

Sangpyo Kim, Jongmin Kim, Jaeyoung Choi, Jung Ho Ahn

TL;DR

Fully homomorphic encryption offers privacy-preserving computation but incurs immense overhead. CiFHER tackles this with a resizable, chiplet-based MCM architecture featuring a composable NTT unit, generalized data mapping, and limb-duplication to minimize inter-chiplet communication and NoP bottlenecks, achieving competitive performance with significantly lower area and power than monolithic ASICs. The approach demonstrates robust performance across CKKS workloads (bootstrapping, CNN inference, sorting, and HELR) and reveals favorable cost-profile tradeoffs when using chiplets versus monolithic dies. Overall, CiFHER provides a scalable, cost-effective pathway toward practical, high-performance FHE accelerators in post-M Moore packaging ecosystems.

Abstract

Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible core architecture whose configuration is adjustable to conform to the global organization of chiplets and design constraints. Its distinctive feature is a composable functional unit providing varying computational throughput for the number-theoretic transform, the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the interconnect overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the packaging constraints. This study demonstrates that a CiFHER package composed of a number of compact chiplets provides performance comparable to state-of-the-art monolithic ASIC accelerators while significantly reducing the package-wide power consumption and manufacturing cost.

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

TL;DR

Fully homomorphic encryption offers privacy-preserving computation but incurs immense overhead. CiFHER tackles this with a resizable, chiplet-based MCM architecture featuring a composable NTT unit, generalized data mapping, and limb-duplication to minimize inter-chiplet communication and NoP bottlenecks, achieving competitive performance with significantly lower area and power than monolithic ASICs. The approach demonstrates robust performance across CKKS workloads (bootstrapping, CNN inference, sorting, and HELR) and reveals favorable cost-profile tradeoffs when using chiplets versus monolithic dies. Overall, CiFHER provides a scalable, cost-effective pathway toward practical, high-performance FHE accelerators in post-M Moore packaging ecosystems.

Abstract

Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible core architecture whose configuration is adjustable to conform to the global organization of chiplets and design constraints. Its distinctive feature is a composable functional unit providing varying computational throughput for the number-theoretic transform, the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the interconnect overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the packaging constraints. This study demonstrates that a CiFHER package composed of a number of compact chiplets provides performance comparable to state-of-the-art monolithic ASIC accelerators while significantly reducing the package-wide power consumption and manufacturing cost.
Paper Structure (27 sections, 2 equations, 10 figures, 3 tables)

This paper contains 27 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Exemplar configurations of a composable NTT unit, simplified to $N=2^8$. A submodule, the smallest unit occupying $\sqrt[4]{N}$ lanes, is shown above. A two-submodule configuration and its NTT process for a length-$N$ polynomial are shown below.
  • Figure 2: The CiFHER package consists of multiple core chiplets and two I/O dies handling HBM. A core comprises functional units (FUs), register files (RFs), and networking components.
  • Figure 3: Generalized data mapping methods of CiFHER with 64 cores: \ref{['fig:mapping:a']} dimension-wise clustering and block clustering for two different exemplar block sizes of \ref{['fig:mapping:b']}$2\times4$ and \ref{['fig:mapping:c']}$4\times4$.
  • Figure 4: The amount of data transfer during key-switching using the method of ARK micro-2022-ark under different $\ell$ conditions.
  • Figure 5: Performance comparison between CiFHER default configurations and prior vector FHE accelerators, \ref{['fig:pareto-boot']}\ref{['fig:pareto-resnet']}\ref{['fig:pareto-helr256']} CLake+ and \ref{['fig:pareto-boot']}\ref{['fig:pareto-resnet']}\ref{['fig:pareto-sorting']}\ref{['fig:pareto-helr1024']} ARK, for the workloads. $n$-chip denotes the default configuration of CiFHER with $n$ core chiplets, except for 1-chip, which is modified from 4-chip to integrate 4 cores into a monolithic die and has double bisection bandwidth to account for the bandwidth gap between NoC and NoP. BK: block clustering. DW: dimension-wise clustering. duplicated: using limb duplication.
  • ...and 5 more figures