xMLP: Revolutionizing Private Inference with Exclusive Square Activation
Jiajie Li, Jinjun Xiong
TL;DR
The paper tackles the privacy-preserving inference bottleneck caused by nonlinear activations by introducing xMLP, a square-activation-only DNN architecture designed for private inference. By analyzing the information-compounding effects of ReLU and leveraging a ViT-like structure with patch and channel mixers, xMLP achieves competitive accuracy with far lower PI latency than ReLU-based models. Across CIFAR-100, Tiny ImageNet, and ImageNet, xMLP attains favorable accuracy with fewer parameters and demonstrates state-of-the-art PI performance, including substantial gains when offloaded to GPUs. The work also provides ablation studies and microbenchmark data showing that private square activations, especially on GPUs, can outperform prior PI systems by factors of several to orders of magnitude, highlighting a promising direction for privacy-aware computer vision. The practical impact is a more scalable, privacy-preserving vision stack suitable for cloud-based services and sensitive data domains.
Abstract
Private Inference (PI) enables deep neural networks (DNNs) to work on private data without leaking sensitive information by exploiting cryptographic primitives such as multi-party computation (MPC) and homomorphic encryption (HE). However, the use of non-linear activations such as ReLU in DNNs can lead to impractically high PI latency in existing PI systems, as ReLU requires the use of costly MPC computations, such as Garbled Circuits. Since square activations can be processed by Beaver's triples hundreds of times faster compared to ReLU, they are more friendly to PI tasks, but using them leads to a notable drop in model accuracy. This paper starts by exploring the reason for such an accuracy drop after using square activations, and concludes that this is due to an "information compounding" effect. Leveraging this insight, we propose xMLP, a novel DNN architecture that uses square activations exclusively while maintaining parity in both accuracy and efficiency with ReLU-based DNNs. Our experiments on CIFAR-100 and ImageNet show that xMLP models consistently achieve better performance than ResNet models with fewer activation layers and parameters while maintaining consistent performance with its ReLU-based variants. Remarkably, when compared to state-of-the-art PI Models, xMLP demonstrates superior performance, achieving a 0.58% increase in accuracy with 7x faster PI speed. Moreover, it delivers a significant accuracy improvement of 4.96% while maintaining the same PI latency. When offloading PI to the GPU, xMLP is up to 700x faster than the previous state-of-the-art PI model with comparable accuracy.
