HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
Tianshi Xu, Meng Li, Runsheng Wang
TL;DR
HEQuant addresses the high communication cost of HE-based private DNN inference by integrating low-precision quantization with homomorphic encryption. It introduces intra-coefficient packing and a quantization-aware tiling strategy to reduce both the number and precision of transmitted data, while maintaining accuracy. The method yields substantial improvements, achieving up to $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction over prior HE-based Protocols, and up to $3.1\sim 3.6\times$ additional communication savings against network-optimization methods. These results demonstrate a practical, scalable path for model-private and data-private inference in real-world deployments.
Abstract
Secure two-party computation with homomorphic encryption (HE) protects data privacy with a formal security guarantee but suffers from high communication overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed efficient HE-based protocols for different neural network (NN) operations, they still assume high precision, e.g., fixed point 37 bit, for the NN operations and ignore NNs' native robustness against quantization error. In this paper, we propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. We observe the benefit of a naive combination of quantization and HE quickly saturates as bit precision goes down. Hence, to further improve communication efficiency, we propose a series of optimizations, including an intra-coefficient packing algorithm and a quantization-aware tiling algorithm, to simultaneously reduce the number and precision of the transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves $3.1\sim 3.6\times$ communication reduction.
