APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption
Lin Ding, Song Bian, Penggao He, Yan Xu, Gang Qu, Jiliang Zhang
TL;DR
APACHE tackles the data movement and resource underutilization barriers in multi-scheme FHE accelerators by introducing a processing-near-memory architecture with a three-level memory hierarchy and a configurable NMC module. It combines scheme-aware task scheduling with flexible interconnects to co-locate computation with memory for both TFHE-like and BFV/CKKS-like operations. The approach yields substantial throughput gains (up to tens of times faster) and dramatic I/O reductions across diverse FHE tasks, as demonstrated against state-of-the-art ASIC accelerators. This work shows that memory-compute co-design and near-memory processing are key to practical, high-throughput multi-scheme FHE execution on DIMMs.
Abstract
Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HE$^{3}$DB.
