EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization
Wenxuan Zeng, Tianshi Xu, Meng Li, Runsheng Wang
TL;DR
EQO tackles the high communication cost of private CNN inference by jointly optimizing OT-based 2PC protocols and neural network quantization around Winograd-based convolution. It introduces QWinConv, graph-level protocol fusion, a simplified residual protocol, and MSB-known optimizations to minimize online communication, complemented by a Hessian-based, mixed-precision quantization and a 2PC-friendly bit re-weighting strategy. Across CIFAR, Tiny-ImageNet, and ImageNet, EQO achieves up to tens of times reduction in communication with accuracy on par with or slightly higher than state-of-the-art baselines. This work demonstrates that careful protocol-level design combined with sensitivity-aware network quantization can dramatically improve the practicality of confidential inference in real-world CNN workloads.
Abstract
Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose EQO, a quantized 2PC inference framework that jointly optimizes the CNNs and 2PC protocols. EQO features a novel 2PC protocol that combines Winograd transformation with quantization for efficient convolution computation. However, we observe naively combining quantization and Winograd convolution is sub-optimal: Winograd transformations introduce extensive local additions and weight outliers that increase the quantization bit widths and require frequent bit width conversions with non-negligible communication overhead. Therefore, at the protocol level, we propose a series of optimizations for the 2PC inference graph to minimize the communication. At the network level, We develop a sensitivity-based mixed-precision quantization algorithm to optimize network accuracy given communication constraints. We further propose a 2PC-friendly bit re-weighting algorithm to accommodate weight outliers without increasing bit widths. With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
