HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

Tianshi Xu; Meng Li; Runsheng Wang

HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

Tianshi Xu, Meng Li, Runsheng Wang

TL;DR

HEQuant addresses the high communication cost of HE-based private DNN inference by integrating low-precision quantization with homomorphic encryption. It introduces intra-coefficient packing and a quantization-aware tiling strategy to reduce both the number and precision of transmitted data, while maintaining accuracy. The method yields substantial improvements, achieving up to $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction over prior HE-based Protocols, and up to $3.1\sim 3.6\times$ additional communication savings against network-optimization methods. These results demonstrate a practical, scalable path for model-private and data-private inference in real-world deployments.

Abstract

Secure two-party computation with homomorphic encryption (HE) protects data privacy with a formal security guarantee but suffers from high communication overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed efficient HE-based protocols for different neural network (NN) operations, they still assume high precision, e.g., fixed point 37 bit, for the NN operations and ignore NNs' native robustness against quantization error. In this paper, we propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. We observe the benefit of a naive combination of quantization and HE quickly saturates as bit precision goes down. Hence, to further improve communication efficiency, we propose a series of optimizations, including an intra-coefficient packing algorithm and a quantization-aware tiling algorithm, to simultaneously reduce the number and precision of the transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves $3.1\sim 3.6\times$ communication reduction.

HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

TL;DR

communication reduction and

latency reduction over prior HE-based Protocols, and up to

additional communication savings against network-optimization methods. These results demonstrate a practical, scalable path for model-private and data-private inference in real-world deployments.

Abstract

communication reduction and

latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves

communication reduction.

Paper Structure (25 sections, 2 equations, 18 figures, 7 tables)

This paper contains 25 sections, 2 equations, 18 figures, 7 tables.

Introduction
Preliminaries
Threat Model
HE-based 2PC Inference
Communication Complexity of HE-based 2PC
Motivation
HEQuant: Communication-Efficient 2PC Framework
HE-based quantized private inference
Intra Co-efficient Packing
Quantization-aware operator tiling
Experimental Results
Experimental Setup
Micro-Benchmark Evaluation
End-to-End Inference Evaluation
Ablation Study
...and 10 more sections

Figures (18)

Figure 1: The acceleration achieved through network quantization (from 32-bit to 4-bit) on Bit Operations (BOPS), GPU, prior-art 2PC frameworks Cheetah and Cheetah with naive quant as well as our proposed framework.
Figure 2: DNN private inference based on HE (linear layer).
Figure 3: A convolution example of coefficient packing.
Figure 4: The input and output communication as well as the bit width of plaintext and ciphertext for different bit precision quantization.
Figure 5: Overview of HEQuant and the communication/latency cost after each optimization step. The examples are ResNet18 on CIFAR-100 and ResNet50 on ImageNet. Coeff. represents coefficient.
...and 13 more figures

HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

TL;DR

Abstract

HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (18)