Compact: Approximating Complex Activation Functions for Secure Computation

Mazharul Islam; Sunpreet S. Arora; Rahul Chatterjee; Peter Rindal; Maliheh Shirvanian

Compact: Approximating Complex Activation Functions for Secure Computation

Mazharul Islam, Sunpreet S. Arora, Rahul Chatterjee, Peter Rindal, Maliheh Shirvanian

TL;DR

Compact introduces MPC-friendly piecewise polynomial approximations for complex activation functions (SiLU, GeLU, Mish) by leveraging input density estimated via batch normalization and optimizing approximation parameters with simulated annealing. The method carefully balances accuracy loss and computation through a weighted mean error and a dynamic, region-aware interpolation strategy based on Chebyshev polynomials, achieving near-plaintext accuracy while delivering 2x–5x faster secure inference than prior work like NFGen on deep networks. It remains training-agnostic and compatible with standard MPC libraries, avoiding retraining or architectural changes, and is validated across four diverse tasks with deep architectures. The work offers an open-source implementation and demonstrates practical impact by enabling scalable, private inference for complex DNNs in MPC-enabled settings.

Abstract

Secure multi-party computation (MPC) techniques can be used to provide data privacy when users query deep neural network (DNN) models hosted on a public cloud. State-of-the-art MPC techniques can be directly leveraged for DNN models that use simple activation functions such as ReLU. However, these techniques are ineffective and/or inefficient for the complex and highly non-linear activation functions used in cutting-edge DNN models. We present Compact, which produces piece-wise polynomial approximations of complex AFs to enable their efficient use with state-of-the-art MPC techniques. Compact neither requires nor imposes any restriction on model training and results in near-identical model accuracy. To achieve this, we design Compact with input density awareness and use an application-specific simulated annealing type optimization to generate computationally more efficient approximations of complex AFs. We extensively evaluate Compact on four different machine-learning tasks with DNN architectures that use popular complex AFs silu, gelu, and mish. Our experimental results show that Compact incurs negligible accuracy loss while being 2x-5x computationally more efficient than state-of-the-art approaches for DNN models with large number of hidden layers. Our work accelerates easy adoption of MPC techniques to provide user data privacy even when the queried DNN models consist of a number of hidden layers and trained over complex AFs.

Compact: Approximating Complex Activation Functions for Secure Computation

TL;DR

Abstract

Paper Structure (66 sections, 10 equations, 10 figures, 6 tables)

This paper contains 66 sections, 10 equations, 10 figures, 6 tables.

Introduction
Summary.
Background and Related Work
Deep Neural Network Preliminaries
Activation Functions (AFs).
Complex AFs.
Batch normalization.
Secure Inference for DNN models
$\textsf{ReLU}$ specific secure inference.
Secure inference for other non-linear AFs.
Problem Overview & Design Goals
Problem Overview
Problem formulation.
Scenario Setup.
Threat model and scope.
...and 51 more sections

Figures (10)

Figure 1: Complex activation functions (AFs) we focus in our work $f(x) \in \{\textsf{SiLU}, \textsf{GeLU}, \textsf{Mish}\}$ and their second derivatives $f"(x)$. These AFs are hard to approximate accurately in regions close to zero where $f"(x) > 0$. We argue this is especially problematic for DNN models as the majority of the input to the complex AF falls to the region that are hard to approximate accurately (i.e., close to zero) due to normalization (Figure ). In contrast, $\textsf{ReLU}(x)$ AF can be precisely approximated with only two simple polynomials $\{\mathop{\mathrm{\hbox{$f$}}}\nolimits_1, \mathop{\mathrm{\hbox{$f$}}}\nolimits_2\}$ which are $\mathop{\mathrm{\hbox{$f$}}}\nolimits_1(x) =0$ when $x < 0$ and $\mathop{\mathrm{\hbox{$f$}}}\nolimits_2(x) = x$ when $x \ge 0$.
Figure 2: The output of the linear operations ($\mathbf{a}^\ell$) are normalized to $\overline{\mathbf{a}}^\ell$ using Equation () before they are forwarded for applying non-linear operations involving complex activation functions (AFs).
Figure 3: Secure inference in cloud-based deployment setting. (a) Proprietary DNN model trained over private data that is not MPC-friendly due to complex non-linear activation functions (AFs) (e.g., $\textsf{SiLU}, \textsf{GeLU}, \textsf{Mish}$). An MPC-friendly model is generated by replacing the complex AFs with their approximations using $\textsf{Compact}\xspace$. (b) Next, we generate $n$ secret shares of the MPC-friendly model and distribute them with $n$ computing servers (in this figure $n = 3$) on the cloud. (c) To get the inference result, the client gets the private input data from the user, generates shares of it and distribute these with the $n$ servers. These servers on the cloud perform secure inference using an MPC engine and return the shares of the inference result to the client, and the client uses them to reconstruct the original inference result.
Figure 4: (Right-top)$\textsf{FindBestPiecePoly}\xspace$ procedure to find an MPC-friendly approximation $\widehat{F}_{\mathsf{act}}$ of the complex activation function (AF) $F_{\mathsf{act}}$ (Section ). The procedure balances the trade-off between inference accuracy loss and performance overhead using an application-specific optimization approach (simulated annealing). It uses two sub-procedures---$\textsf{GenerateNeighbour}$ to generate a random neighbor $\theta'$ from a given $\theta$ (shown Right-bottom) and $\textsf{GenAccuracteApprox}\xspace$ to approximate the region $[s, e]$ accurately using a set of at most $\mathsf{m}$ polynomials (shown Left) with degree $\leq \mathsf{k}$ (Section ). Notations are explained briefly in Table .
Figure 5: Comparison of inference time (left), and accuracy loss (right) of NFGen with Compact. For both lower is better. We experiment with two approximation error thresholds for NFGen: i) $\delta = 10^{-1}$ (a crude one we set) and ii) $\delta = 10^{-3}$ (used in NFGenFanCCS22). NFGen ($\delta = 10^{-1}$) achieves a lower inference time but has significant accuracy loss---while NFGen ($\delta = 10^{-3}$) shows the opposite characteristic when we compare them with Compact as #HLs increases. Compact performs well on both accounts.
...and 5 more figures

Compact: Approximating Complex Activation Functions for Secure Computation

TL;DR

Abstract

Compact: Approximating Complex Activation Functions for Secure Computation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)