APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Lin Ding; Song Bian; Penggao He; Yan Xu; Gang Qu; Jiliang Zhang

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Lin Ding, Song Bian, Penggao He, Yan Xu, Gang Qu, Jiliang Zhang

TL;DR

APACHE tackles the data movement and resource underutilization barriers in multi-scheme FHE accelerators by introducing a processing-near-memory architecture with a three-level memory hierarchy and a configurable NMC module. It combines scheme-aware task scheduling with flexible interconnects to co-locate computation with memory for both TFHE-like and BFV/CKKS-like operations. The approach yields substantial throughput gains (up to tens of times faster) and dramatic I/O reductions across diverse FHE tasks, as demonstrated against state-of-the-art ASIC accelerators. This work shows that memory-compute co-design and near-memory processing are key to practical, high-throughput multi-scheme FHE execution on DIMMs.

Abstract

Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HE$^{3}$DB.

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

TL;DR

Abstract

DB.

Paper Structure (15 sections, 3 equations, 8 figures, 5 tables)

This paper contains 15 sections, 3 equations, 8 figures, 5 tables.

Introduction
Background
Decomposition and Classification of FHE Operators
Circuit Designs for FHE Acceleration
Architecture of APACHE
Architectural Overview
Task-level Scheduling and Parallelism
Design Exploration of Functional Units
The NMC Module and its Interconnection
FU Configuration in the NMC Module
Implementation and Evaluation
Implementation and Setup
Performance and Comparison
Architectural Analysis
Conclusion

Figures (8)

Figure 1: Evaluation of I/O load in the pipelined accelerator, referring to tfheHE3DAPoseidon.
Figure 2: (a) Overview of APACHE structure; (b) Modified $\times$8 DRAM chip with a hierarchy of array-to-bank-to-bank group; and (c) Modified memory array for computing $\mathsf{KS}$. Dashed and solid lines represent the control and data flows, respectively.
Figure 3: Dataflow of (a) $\mathsf{CMUX}$ and (b) $\mathsf{HRot}$ and $\mathsf{CMult}$ (2 to 9). $\mathsf{BConv}$ consists of $\mathsf{MMult}$ and $\mathsf{MAdd}$.
Figure 4: The topology of NMC module. Dashed line stands for wires with transistors to control whether to link $\mathsf{(I)NTT}$ FU with $\mathsf{MMult}$ FU.
Figure 5: The proposed configurable modular multiplier, (a) working as a 64-bit modular multiplier, and (b) working as two parallel 32-bit modular multiplier.
...and 3 more figures

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

TL;DR

Abstract

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Authors

TL;DR

Abstract

Table of Contents

Figures (8)