Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions
Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, Simon Oya, Ehsan Amjadian, Florian Kerschbaum
TL;DR
This work targets secure neural network inference under MLaaS by removing the non-linear ReLU bottleneck through activation co-design. It introduces ESPN, a single-round MPC protocol for efficient polynomial evaluation, and PILLAR, a training-time regularization and fitting framework to maintain high accuracy with polynomial activations. The combination yields dramatic WAN speedups (3x to 110x) while preserving competitive accuracy on CIFAR and ImageNet-scale models (up to ResNet-50 with 23M parameters). The results demonstrate practical secure inference performance, paving the way for scalable private ML services, and point to future improvements in precision, pooling strategies, and integration with COINN-style techniques. Overall, the paper provides a viable pathway to fast, private DNN inference using polynomial activations and tailored training in a two-party MPC setting.
Abstract
Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $3$ and $110\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.
