Table of Contents
Fetching ...

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Lin Ding, Song Bian, Penggao He, Yan Xu, Gang Qu, Jiliang Zhang

TL;DR

APACHE tackles the data movement and resource underutilization barriers in multi-scheme FHE accelerators by introducing a processing-near-memory architecture with a three-level memory hierarchy and a configurable NMC module. It combines scheme-aware task scheduling with flexible interconnects to co-locate computation with memory for both TFHE-like and BFV/CKKS-like operations. The approach yields substantial throughput gains (up to tens of times faster) and dramatic I/O reductions across diverse FHE tasks, as demonstrated against state-of-the-art ASIC accelerators. This work shows that memory-compute co-design and near-memory processing are key to practical, high-throughput multi-scheme FHE execution on DIMMs.

Abstract

Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HE$^{3}$DB.

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

TL;DR

APACHE tackles the data movement and resource underutilization barriers in multi-scheme FHE accelerators by introducing a processing-near-memory architecture with a three-level memory hierarchy and a configurable NMC module. It combines scheme-aware task scheduling with flexible interconnects to co-locate computation with memory for both TFHE-like and BFV/CKKS-like operations. The approach yields substantial throughput gains (up to tens of times faster) and dramatic I/O reductions across diverse FHE tasks, as demonstrated against state-of-the-art ASIC accelerators. This work shows that memory-compute co-design and near-memory processing are key to practical, high-throughput multi-scheme FHE execution on DIMMs.

Abstract

Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HEDB.
Paper Structure (15 sections, 3 equations, 8 figures, 5 tables)

This paper contains 15 sections, 3 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Evaluation of I/O load in the pipelined accelerator, referring to tfheHE3DAPoseidon.
  • Figure 2: (a) Overview of APACHE structure; (b) Modified $\times$8 DRAM chip with a hierarchy of array-to-bank-to-bank group; and (c) Modified memory array for computing $\mathsf{KS}$. Dashed and solid lines represent the control and data flows, respectively.
  • Figure 3: Dataflow of (a) $\mathsf{CMUX}$ and (b) $\mathsf{HRot}$ and $\mathsf{CMult}$ (2 to 9). $\mathsf{BConv}$ consists of $\mathsf{MMult}$ and $\mathsf{MAdd}$.
  • Figure 4: The topology of NMC module. Dashed line stands for wires with transistors to control whether to link $\mathsf{(I)NTT}$ FU with $\mathsf{MMult}$ FU.
  • Figure 5: The proposed configurable modular multiplier, (a) working as a 64-bit modular multiplier, and (b) working as two parallel 32-bit modular multiplier.
  • ...and 3 more figures