Table of Contents
Fetching ...

Efficient Pruning for Machine Learning Under Homomorphic Encryption

Ehud Aharoni, Moran Baruch, Pradip Bose, Alper Buyuktosunoglu, Nir Drucker, Subhankar Pal, Tomer Pelleg, Kanthi Sarpatwar, Hayim Shaul, Omri Soceanu, Roman Vaculin

TL;DR

This work introduces a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference, and demonstrates the effectiveness of these methods for pruning fully-connected and convolutional layers in NNs on PPML tasks.

Abstract

Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Among these, many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, but at the cost of large latency and memory requirements. Pruning neural network (NN) parameters improves latency and memory in plaintext ML but has little impact if directly applied to HE-based PPML. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference. HE-PEx uses permutations to prune additional ciphertexts, and expansion to recover inference loss. We demonstrate the effectiveness of our methods for pruning fully-connected and convolutional layers in NNs on PPML tasks, namely, image compression, denoising, and classification, with autoencoders, multilayer perceptrons (MLPs) and convolutional neural networks (CNNs). We implement and deploy our networks atop a framework called HElayers, which shows a 10-35% improvement in inference speed and a 17-35% decrease in memory requirement over the unpruned network, corresponding to 33-65% fewer ciphertexts, within a 2.5% degradation in inference accuracy over the unpruned network. Compared to the state-of-the-art pruning technique for PPML, our techniques generate networks with 70% fewer ciphertexts, on average, for the same degradation limit.

Efficient Pruning for Machine Learning Under Homomorphic Encryption

TL;DR

This work introduces a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference, and demonstrates the effectiveness of these methods for pruning fully-connected and convolutional layers in NNs on PPML tasks.

Abstract

Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Among these, many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, but at the cost of large latency and memory requirements. Pruning neural network (NN) parameters improves latency and memory in plaintext ML but has little impact if directly applied to HE-based PPML. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference. HE-PEx uses permutations to prune additional ciphertexts, and expansion to recover inference loss. We demonstrate the effectiveness of our methods for pruning fully-connected and convolutional layers in NNs on PPML tasks, namely, image compression, denoising, and classification, with autoencoders, multilayer perceptrons (MLPs) and convolutional neural networks (CNNs). We implement and deploy our networks atop a framework called HElayers, which shows a 10-35% improvement in inference speed and a 17-35% decrease in memory requirement over the unpruned network, corresponding to 33-65% fewer ciphertexts, within a 2.5% degradation in inference accuracy over the unpruned network. Compared to the state-of-the-art pruning technique for PPML, our techniques generate networks with 70% fewer ciphertexts, on average, for the same degradation limit.
Paper Structure (20 sections, 1 equation, 14 figures, 2 tables)

This paper contains 20 sections, 1 equation, 14 figures, 2 tables.

Figures (14)

  • Figure 1: An example of packing a $3$$\times$$6$ matrix in a tile tensor format that operates over 8 ciphertexts ($16$ slots each). The matrix is zero-padded over dimensions 1 and 2, and is replicated three times over dimension 3. See helayers for further information.
  • Figure 2: Pruning schemes composed of prune, permute, expand, and pack methods.
  • Figure 3: Illustration of permutation and expansion for the P3E scheme when considering a $4$-layer network with $6$,$4$,$8$,$4$ neurons in layers A-D, resp. We divide the weight matrices into $2 \times 2$ tiles and prune $54/88=61\%$ of the weights. The pruned (i) has a tile sparsity of only $2/22 \approx 10\%$ zero tiles. Permutation (ii), improves it to $7/22=35\%$. Expansion (iii) (with re-training) restores most of the accuracy loss.
  • Figure 4: Permutation of a single $4$$\times$$8$ weight matrix considering tiles of shape $2$$\times$$2$, with zero tiles highlighted. We show the algorithm on the weight matrix instead of the mask matrix, for illustration. Here, two k-means iterations are sufficient to reach the solution with the maximum tile sparsity (optimality tested using exhaustive search).
  • Figure 5: Left. Illustration of weight co-permutation in a multi-layered FC-only network with weight matrices $W_{AB}, \ldots, W_{EF}$. The row and column permutation phases separately permute three different sets of matrices, where the permute operations correspond to permuting the highlighted neurons. Right. Illustration of weight transposition in the context of FC (top) and Conv (bottom) layers.
  • ...and 9 more figures