Table of Contents
Fetching ...

DeepShare: Sharing ReLU Across Channels and Layers for Efficient Private Inference

Yonathan Bornfeld, Shai Avidan

TL;DR

DeepShare introduces DReLU-based sharing of nonlinear gates to dramatically reduce ReLU computations in Private Inference without sacrificing accuracy. By partitioning channels into prototype and replicate groups and extending sharing across layers, it achieves strong Pareto-frontier performance on CIFAR-100 with ResNet-18 and state-of-the-art results on segmentation, while maintaining cryptographic PI practicality. The authors also provide a theoretical construction showing that a single DReLU can express complex decision boundaries, addressing expressiveness concerns raised by prior ReLU-pruning methods. The approach relies on a GELU-to-ReLU transitional training protocol to enable gradient flow and uses affine gate transformations to maintain expressiveness, offering a practical path to more scalable private inference systems.

Abstract

Private Inference (PI) uses cryptographic primitives to perform privacy preserving machine learning. In this setting, the owner of the network runs inference on the data of the client without learning anything about the data and without revealing any information about the model. It has been observed that a major computational bottleneck of PI is the calculation of the gate (i.e., ReLU), so a considerable amount of effort have been devoted to reducing the number of ReLUs in a given network. We focus on the DReLU, which is the non-linear step function of the ReLU and show that one DReLU can serve many ReLU operations. We suggest a new activation module where the DReLU operation is only performed on a subset of the channels (Prototype channels), while the rest of the channels (replicate channels) replicates the DReLU of each of their neurons from the corresponding neurons in one of the prototype channels. We then extend this idea to work across different layers. We show that this formulation can drastically reduce the number of DReLU operations in resnet type network. Furthermore, our theoretical analysis shows that this new formulation can solve an extended version of the XOR problem, using just one non-linearity and two neurons, something that traditional formulations and some PI specific methods cannot achieve. We achieve new SOTA results on several classification setups, and achieve SOTA results on image segmentation.

DeepShare: Sharing ReLU Across Channels and Layers for Efficient Private Inference

TL;DR

DeepShare introduces DReLU-based sharing of nonlinear gates to dramatically reduce ReLU computations in Private Inference without sacrificing accuracy. By partitioning channels into prototype and replicate groups and extending sharing across layers, it achieves strong Pareto-frontier performance on CIFAR-100 with ResNet-18 and state-of-the-art results on segmentation, while maintaining cryptographic PI practicality. The authors also provide a theoretical construction showing that a single DReLU can express complex decision boundaries, addressing expressiveness concerns raised by prior ReLU-pruning methods. The approach relies on a GELU-to-ReLU transitional training protocol to enable gradient flow and uses affine gate transformations to maintain expressiveness, offering a practical path to more scalable private inference systems.

Abstract

Private Inference (PI) uses cryptographic primitives to perform privacy preserving machine learning. In this setting, the owner of the network runs inference on the data of the client without learning anything about the data and without revealing any information about the model. It has been observed that a major computational bottleneck of PI is the calculation of the gate (i.e., ReLU), so a considerable amount of effort have been devoted to reducing the number of ReLUs in a given network. We focus on the DReLU, which is the non-linear step function of the ReLU and show that one DReLU can serve many ReLU operations. We suggest a new activation module where the DReLU operation is only performed on a subset of the channels (Prototype channels), while the rest of the channels (replicate channels) replicates the DReLU of each of their neurons from the corresponding neurons in one of the prototype channels. We then extend this idea to work across different layers. We show that this formulation can drastically reduce the number of DReLU operations in resnet type network. Furthermore, our theoretical analysis shows that this new formulation can solve an extended version of the XOR problem, using just one non-linearity and two neurons, something that traditional formulations and some PI specific methods cannot achieve. We achieve new SOTA results on several classification setups, and achieve SOTA results on image segmentation.

Paper Structure

This paper contains 32 sections, 1 theorem, 27 equations, 10 figures, 7 tables, 1 algorithm.

Key Result

Corollary 1

An SNL model with a single hidden layer, a single ReLU neuron, and any number of linear neurons, will have either no decision boundary, or one of the following decision boundaries: a single line, two parallel lines, or a piecewise linear with two pieces

Figures (10)

  • Figure 1: Our approach, DeepShare, achieves the Pareto frontier on ReLU counts vs. test accuracy for CIFAR-100 using ResNet 18.
  • Figure 2: Effictive Dimension of ReLU gates across channels: Boxplots of the normalized effective dimension across four layers. For each layer, the blue boxes show the effective dimension at corresponding spatial positions, The orange boxes show the effective dimension obtained after spatial shuffling within each channel. both normalized by the actual dimension, meaning the values are between 0 and 1, with smaller numbers indicating that the gates are correlated and have an effective dimension much smaller than the real one.
  • Figure 3: DReLU Sharing: DeepShare shares DReLU across channels of the same layer. (Right) in standard ResNet activation, the ReLU is the product of the input and a DReLU (gate) operation on the input. (Left) In our DReLU sharing scheme, channels in a layer are partitioned into prototype and replicate channels. The ReLU of a prototype neuron is the product of its input and the DReLU of the input (top row, blue neuron). In contrast, the ReLU of a replicate neurons (bottom two rows) is the product of their input (orange) and the DReLU of the corresponding product neuron (blue). All three neurons share the DReLU of the prototype neuron, thus reducing overall number of gates in the network. The 1D affine transformation adds flexibility and expressive power to the network.
  • Figure 4: Gradient Flow: An illustration of the gradient flow between the different operations during back propagation. (left) DReLU has a derivative of zero so it blocks (the red cut on the edge between the prototype neuron and its DReLU) the flow of the gradients from the replicate neurons to the prototype neuron. (Right) gating component of GELU has a non-zero derivative so the gradients are not blocked. This is why it is advantageous to first train with it before switching to DReLU.
  • Figure 5: The XOR problem: We wish to solve this classic XOR problem using just a single gate. We show that DeepShare can solve it using a single gate.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Corollary 1
  • proof