Table of Contents
Fetching ...

NTTSuite: Number Theoretic Transform Benchmarks for Accelerating Encrypted Computation

Juran Ding, Yuanzhe Liu, Lingbin Sun, Brandon Reagen

TL;DR

NTTSuite addresses the heavy overheads of privacy-preserving HE by benchmarking the Number Theoretic Transform (NTT) across CPU, GPU, and FPGA/HLS implementations. It introduces seven NTT algorithms, including a novel Pease_nc, and applies optimizations such as explicit modular reduction and memory-access-aware pipelining to maximize hardware throughput. The suite provides a common, open-source baseline with validated CPU, GPU, and FPGA results, delivering up to $30\%$ performance gains over the HEAX baseline on FPGA and revealing clear platform-specific tradeoffs. By enabling reproducible cross-platform optimization of NTTs, NTTSuite advances practical HE acceleration and privacy-preserving computation research.

Abstract

Privacy concerns have thrust privacy-preserving computation into the spotlight. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data, providing users with strong privacy (and security) guarantees while using the same services they enjoy today unprotected. While promising, HE has seen little adoption due to extremely high computational overheads, rendering it impractical. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data. In this paper we develop a benchmark suite, named NTTSuite, to enable researchers to better address these overheads by studying the primary source of HE's slowdown: the number theoretic transform (NTT). NTTSuite constitutes seven unique NTT algorithms with support for CPUs (C++), GPUs (CUDA), and custom hardware (Catapult HLS).In addition, we propose optimizations to improve the performance of NTT running on FPGAs. We find our implementation outperforms the state-of-the-art by 30%.

NTTSuite: Number Theoretic Transform Benchmarks for Accelerating Encrypted Computation

TL;DR

NTTSuite addresses the heavy overheads of privacy-preserving HE by benchmarking the Number Theoretic Transform (NTT) across CPU, GPU, and FPGA/HLS implementations. It introduces seven NTT algorithms, including a novel Pease_nc, and applies optimizations such as explicit modular reduction and memory-access-aware pipelining to maximize hardware throughput. The suite provides a common, open-source baseline with validated CPU, GPU, and FPGA results, delivering up to performance gains over the HEAX baseline on FPGA and revealing clear platform-specific tradeoffs. By enabling reproducible cross-platform optimization of NTTs, NTTSuite advances practical HE acceleration and privacy-preserving computation research.

Abstract

Privacy concerns have thrust privacy-preserving computation into the spotlight. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data, providing users with strong privacy (and security) guarantees while using the same services they enjoy today unprotected. While promising, HE has seen little adoption due to extremely high computational overheads, rendering it impractical. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data. In this paper we develop a benchmark suite, named NTTSuite, to enable researchers to better address these overheads by studying the primary source of HE's slowdown: the number theoretic transform (NTT). NTTSuite constitutes seven unique NTT algorithms with support for CPUs (C++), GPUs (CUDA), and custom hardware (Catapult HLS).In addition, we propose optimizations to improve the performance of NTT running on FPGAs. We find our implementation outperforms the state-of-the-art by 30%.
Paper Structure (13 sections, 7 equations, 5 figures, 2 tables, 5 algorithms)

This paper contains 13 sections, 7 equations, 5 figures, 2 tables, 5 algorithms.

Figures (5)

  • Figure 1: Memory access pattern for the three major NTT types.
  • Figure 2: Optimizations applied cumulatively to the Pease_nc algorithm. Speedups are noted on top of the bars.
  • Figure 3: Optimization resource utilization for the Pease_nc algorithm.
  • Figure 4: GPU and FPGA speedup relative to the CPU. Problem size is noted in the top right of each plot.
  • Figure 5: Block Diagram of Pease No-copy with AXI interconnect, BRAM modules, and PCIe-AXI Bridge