Table of Contents
Fetching ...

HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform

Xiangchen Meng, Zijun Jiang, Yangdi Lyu

TL;DR

HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs), and introduces a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles.

Abstract

Polynomial multiplication is one of the fundamental operations in many applications, such as fully homomorphic encryption (FHE). However, the computational inefficiency stemming from polynomials with many large-bit coefficients poses a significant challenge for the practical implementation of FHE. The Number Theoretic Transform (NTT) has proven an effective tool in enhancing polynomial multiplication, but a fast and adaptable method for generating NTT accelerators is lacking. In this paper, we introduce HF-NTT, a novel NTT accelerator. HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs). Meanwhile, we introduce a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles. Furthermore, Our accelerator includes a hardware-friendly modular multiplication design and a configurable PE capable of adapting its data path, resulting in a universal architecture. We synthesized and implemented prototype using Vivado 2022.2, and evaluated it on the Xilinx Virtex-7 FPGA platform. The results demonstrate significant improvements in Area-Time-Product (ATP) and processing speed for different polynomial degrees. In scenarios involving multi-modulus polynomial multiplication, our prototype consistently outperforms other designs in both ATP and latency metrics.

HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform

TL;DR

HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs), and introduces a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles.

Abstract

Polynomial multiplication is one of the fundamental operations in many applications, such as fully homomorphic encryption (FHE). However, the computational inefficiency stemming from polynomials with many large-bit coefficients poses a significant challenge for the practical implementation of FHE. The Number Theoretic Transform (NTT) has proven an effective tool in enhancing polynomial multiplication, but a fast and adaptable method for generating NTT accelerators is lacking. In this paper, we introduce HF-NTT, a novel NTT accelerator. HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs). Meanwhile, we introduce a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles. Furthermore, Our accelerator includes a hardware-friendly modular multiplication design and a configurable PE capable of adapting its data path, resulting in a universal architecture. We synthesized and implemented prototype using Vivado 2022.2, and evaluated it on the Xilinx Virtex-7 FPGA platform. The results demonstrate significant improvements in Area-Time-Product (ATP) and processing speed for different polynomial degrees. In scenarios involving multi-modulus polynomial multiplication, our prototype consistently outperforms other designs in both ATP and latency metrics.
Paper Structure (21 sections, 5 theorems, 13 equations, 11 figures, 3 tables, 3 algorithms)

This paper contains 21 sections, 5 theorems, 13 equations, 11 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

For the data layout in Equation eq:storage_location, $a[i]$ is in a different bank with all coefficients in $\{a[i + 1], a[i + 2], a[i + 4], ..., a[i + 2^t], ..., a[i + N/2]\}$ for $N > 4~(n > 2)$.

Figures (11)

  • Figure 1: The Time Breakdown of Homomorphic Multiplication (Polynomial Degree $N = 4096, \left\lceil\log q \right\rceil = 32$)
  • Figure 2: Dataflow of NTT and INTT Using GS and CT Algorithms ($N = 16$, $a$: coefficient representation, $A$: point-value representation)
  • Figure 3: Example of Memory Access Hazard
  • Figure 4: Dataflow Scheduling Strategy for $N$ = 64, $N_{pe}$ = 4
  • Figure 5: Example of Read After Write Hazard
  • ...and 6 more figures

Theorems & Definitions (9)

  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5