Trinity: A General Purpose FHE Accelerator

Xianglong Deng; Shengyu Fan; Zhicheng Hu; Zhuoyu Tian; Zihao Yang; Jiangrui Yu; Dingyuan Cao; Dan Meng; Rui Hou; Meng Li; Qian Lou; Mingzhe Zhang

Trinity: A General Purpose FHE Accelerator

Xianglong Deng, Shengyu Fan, Zhicheng Hu, Zhuoyu Tian, Zihao Yang, Jiangrui Yu, Dingyuan Cao, Dan Meng, Rui Hou, Meng Li, Qian Lou, Mingzhe Zhang

TL;DR

This paper presents the first multi-modal FHE accelerator based on a unified architecture, which efficiently supports CKKS, TFHE, and their conversion scheme within a single accelerator and achieves 919.3 x performance improvement for the FHE-conversion scheme over the CPU-based implementation.

Abstract

In this paper, we present the first multi-modal FHE accelerator based on a unified architecture, which efficiently supports CKKS, TFHE, and their conversion scheme within a single accelerator. To achieve this goal, we first analyze the theoretical foundations of the aforementioned schemes and highlight their composition from a finite number of arithmetic kernels. Then, we investigate the challenges for efficiently supporting these kernels within a unified architecture, which include 1) concurrent support for NTT and FFT, 2) maintaining high hardware utilization across various polynomial lengths, and 3) ensuring consistent performance across diverse arithmetic kernels. To tackle these challenges, we propose a novel FHE accelerator named Trinity, which incorporates algorithm optimizations, hardware component reuse, and dynamic workload scheduling to enhance the acceleration of CKKS, TFHE, and their conversion scheme. By adaptive select the proper allocation of components for NTT and MAC, Trinity maintains high utilization across NTTs with various polynomial lengths and imbalanced arithmetic workloads. The experiment results show that, for the pure CKKS and TFHE workloads, the performance of our Trinity outperforms the state-of-the-art accelerator for CKKS (SHARP) and TFHE (Morphling) by 1.49x and 4.23x, respectively. Moreover, Trinity achieves 919.3x performance improvement for the FHE-conversion scheme over the CPU-based implementation. Notably, despite the performance improvement, the hardware overhead of Trinity is only 85% of the summed circuit areas of SHARP and Morphling.

Trinity: A General Purpose FHE Accelerator

TL;DR

Abstract

Trinity: A General Purpose FHE Accelerator

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)