Table of Contents
Fetching ...

Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions

Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, Simon Oya, Ehsan Amjadian, Florian Kerschbaum

TL;DR

This work targets secure neural network inference under MLaaS by removing the non-linear ReLU bottleneck through activation co-design. It introduces ESPN, a single-round MPC protocol for efficient polynomial evaluation, and PILLAR, a training-time regularization and fitting framework to maintain high accuracy with polynomial activations. The combination yields dramatic WAN speedups (3x to 110x) while preserving competitive accuracy on CIFAR and ImageNet-scale models (up to ResNet-50 with 23M parameters). The results demonstrate practical secure inference performance, paving the way for scalable private ML services, and point to future improvements in precision, pooling strategies, and integration with COINN-style techniques. Overall, the paper provides a viable pathway to fast, private DNN inference using polynomial activations and tailored training in a two-party MPC setting.

Abstract

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $3$ and $110\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.

Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions

TL;DR

This work targets secure neural network inference under MLaaS by removing the non-linear ReLU bottleneck through activation co-design. It introduces ESPN, a single-round MPC protocol for efficient polynomial evaluation, and PILLAR, a training-time regularization and fitting framework to maintain high accuracy with polynomial activations. The combination yields dramatic WAN speedups (3x to 110x) while preserving competitive accuracy on CIFAR and ImageNet-scale models (up to ResNet-50 with 23M parameters). The results demonstrate practical secure inference performance, paving the way for scalable private ML services, and point to future improvements in precision, pooling strategies, and integration with COINN-style techniques. Overall, the paper provides a viable pathway to fast, private DNN inference using polynomial activations and tailored training in a two-party MPC setting.

Abstract

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between and speedups in inference time on large models with up to million parameters while maintaining competitive inference accuracy.
Paper Structure (62 sections, 4 theorems, 13 equations, 12 figures, 9 tables, 2 algorithms)

This paper contains 62 sections, 4 theorems, 13 equations, 12 figures, 9 tables, 2 algorithms.

Key Result

Theorem E.1

Given an input $[[x]]$ = $x_A + x_B$ and exponent $k$, Algorithm alg:binomial_exp, correctly returns $[[x^k]]$.

Figures (12)

  • Figure 1: Summary of the inference time in seconds vs. test accuracy for each state-of-the-art approach on the CIFAR-10 dataset in the WAN (100 ms roundtrip delay).
  • Figure 2: Benchmarking the secure evaluation of ReLU activation functions using various approaches. The $x$-axis is the network delay in ms and the $y$-axis is the mean runtime in seconds averaged over $20$ runs with the shaded area representing the $95\%$ confidence intervals.
  • Figure 3: Accuracy of a 2-layer convolutional network trained with varying degrees for the polynomial activation function.
  • Figure 4: Illustrating the escaping activation problem for the two layers convolutional network.
  • Figure 5: The effect of the regularization coefficient, $\beta$ on model accuracy and out-of-range ratio for $\gamma=10$
  • ...and 7 more figures

Theorems & Definitions (8)

  • Theorem E.1
  • proof
  • Theorem E.2
  • proof
  • Theorem E.3
  • proof
  • Theorem E.4
  • proof