xMLP: Revolutionizing Private Inference with Exclusive Square Activation

Jiajie Li; Jinjun Xiong

xMLP: Revolutionizing Private Inference with Exclusive Square Activation

Jiajie Li, Jinjun Xiong

TL;DR

The paper tackles the privacy-preserving inference bottleneck caused by nonlinear activations by introducing xMLP, a square-activation-only DNN architecture designed for private inference. By analyzing the information-compounding effects of ReLU and leveraging a ViT-like structure with patch and channel mixers, xMLP achieves competitive accuracy with far lower PI latency than ReLU-based models. Across CIFAR-100, Tiny ImageNet, and ImageNet, xMLP attains favorable accuracy with fewer parameters and demonstrates state-of-the-art PI performance, including substantial gains when offloaded to GPUs. The work also provides ablation studies and microbenchmark data showing that private square activations, especially on GPUs, can outperform prior PI systems by factors of several to orders of magnitude, highlighting a promising direction for privacy-aware computer vision. The practical impact is a more scalable, privacy-preserving vision stack suitable for cloud-based services and sensitive data domains.

Abstract

Private Inference (PI) enables deep neural networks (DNNs) to work on private data without leaking sensitive information by exploiting cryptographic primitives such as multi-party computation (MPC) and homomorphic encryption (HE). However, the use of non-linear activations such as ReLU in DNNs can lead to impractically high PI latency in existing PI systems, as ReLU requires the use of costly MPC computations, such as Garbled Circuits. Since square activations can be processed by Beaver's triples hundreds of times faster compared to ReLU, they are more friendly to PI tasks, but using them leads to a notable drop in model accuracy. This paper starts by exploring the reason for such an accuracy drop after using square activations, and concludes that this is due to an "information compounding" effect. Leveraging this insight, we propose xMLP, a novel DNN architecture that uses square activations exclusively while maintaining parity in both accuracy and efficiency with ReLU-based DNNs. Our experiments on CIFAR-100 and ImageNet show that xMLP models consistently achieve better performance than ResNet models with fewer activation layers and parameters while maintaining consistent performance with its ReLU-based variants. Remarkably, when compared to state-of-the-art PI Models, xMLP demonstrates superior performance, achieving a 0.58% increase in accuracy with 7x faster PI speed. Moreover, it delivers a significant accuracy improvement of 4.96% while maintaining the same PI latency. When offloading PI to the GPU, xMLP is up to 700x faster than the previous state-of-the-art PI model with comparable accuracy.

xMLP: Revolutionizing Private Inference with Exclusive Square Activation

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 5 figures, 6 tables)

This paper contains 21 sections, 3 equations, 5 figures, 6 tables.

Introduction
Background
The Proposed xMLP for PI
Intuitive Understanding of ReLU's Impact
xMLP Architecture Design
Experiments
Image Classification
CIFAR-100.
Tiny ImageNet.
ImageNet.
Ablation Studies
Impact of activation functions.
Ablation study on the architecture rationality.
Private Inference of xMLP
PI Experiment Settings
...and 6 more sections

Figures (5)

Figure 1: xMLP pushes the new Pareto frontier of Private Inference Latency vs. Accuracy on CIFAR-100.
Figure 2: DNN architectures. "Green blocks“ contain only PI-friendly operations (e.g., linear layers, square activation). "Blue" blocks contain non-PI- friendly operations (e.g., ReLU activation, softmax). "DConv" is the depth-wise convolution. Contrary to other architectures, xMLP includes only linear operations and square activation which are both PI-friendly. The details of our xMLP block are shown in \ref{['fig:xMLP_structure']}.
Figure 3: $ReLU(x)^2$ has the sparsity-inducing ability but also suffers from gradient vanishing and exploding problems.
Figure 4: The xMLP architecture. xMLP consists of a patch-embedding layer, xMLP layers and a classifier head. Each xMLP layer consists of (1) a patch mixer with a residual depth-wise convolution for exchanging cross-patch information, and (2) a channel mixer layer (xMLP block) with a residual MLP with quadratic activation for exchanging cross-channel information.
Figure 5: Online inference latency breakdown.

xMLP: Revolutionizing Private Inference with Exclusive Square Activation

TL;DR

Abstract

xMLP: Revolutionizing Private Inference with Exclusive Square Activation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)