Table of Contents
Fetching ...

Hyena: Optimizing Homomorphically Encrypted Convolution for Private CNN Inference

Hyeri Roh, Woo-Seok Choi

TL;DR

Hyena tackles the dominant latency bottleneck in private CNN inference by introducing a Walsh-Hadamard-based plaintext multiplication that enables output channel packing in homomorphic convolution. It combines padding-based weight storage savings with channel packing and introduces optimal parameter selection and lazy reduction to aggressively reduce latency, noise growth, and computational overhead. The approach yields 1.6–3.8× speedups, 2000–8000× reductions in weight storage, and substantial end-to-end improvements (1.3–2.5× latency, 2.1–7.9× memory, 1.4–1.5× communication) on VGG-16, ResNet-20, and MobileNetV1 for ImageNet, compared to conventional methods. These results demonstrate Hyena’s practical impact for scalable private CNN inference with reduced offline requirements and storage demands in a hybrid HE/MPC PI framework.

Abstract

Private inference using homomorphic encryption has gained a great attention to leverage powerful predictive models, e.g., deep convolutional neural networks (CNNs), in the area where data privacy is crucial, such as in healthcare or medical services. Processing convolution layers, however, occupies a huge portion (more than 85%) of the total latency for private CNN inference. To solve this issue, this paper presents Hyena utilizing a novel homomorphic convolution algorithm that provides speedup, communication cost, and storage saving. We first note that padded convolution provides the advantage of model storage saving, but it does not support output channel packing, thereby increasing the amount of computation and communication. We address this limitation by proposing a novel plaintext multiplication algorithm using the Walsh-Hadamard matrix. Furthermore, we propose the optimization techniques to significantly reduce the latency of the proposed convolution by selecting the optimal encryption parameters and applying lazy reduction. Overall, Hyena achieves 1.6-3.8x speedup and reduces the weight storage by 2000-8000x compared to the conventional convolution. For deep CNNs like VGG-16, ResNet-20, and MobileNetV1 on ImageNet, Hyena reduces the end-to-end latency by 1.3-2.5x, the memory usage by 2.1-7.9x and communication cost by 1.4-1.5x compared to conventional method.

Hyena: Optimizing Homomorphically Encrypted Convolution for Private CNN Inference

TL;DR

Hyena tackles the dominant latency bottleneck in private CNN inference by introducing a Walsh-Hadamard-based plaintext multiplication that enables output channel packing in homomorphic convolution. It combines padding-based weight storage savings with channel packing and introduces optimal parameter selection and lazy reduction to aggressively reduce latency, noise growth, and computational overhead. The approach yields 1.6–3.8× speedups, 2000–8000× reductions in weight storage, and substantial end-to-end improvements (1.3–2.5× latency, 2.1–7.9× memory, 1.4–1.5× communication) on VGG-16, ResNet-20, and MobileNetV1 for ImageNet, compared to conventional methods. These results demonstrate Hyena’s practical impact for scalable private CNN inference with reduced offline requirements and storage demands in a hybrid HE/MPC PI framework.

Abstract

Private inference using homomorphic encryption has gained a great attention to leverage powerful predictive models, e.g., deep convolutional neural networks (CNNs), in the area where data privacy is crucial, such as in healthcare or medical services. Processing convolution layers, however, occupies a huge portion (more than 85%) of the total latency for private CNN inference. To solve this issue, this paper presents Hyena utilizing a novel homomorphic convolution algorithm that provides speedup, communication cost, and storage saving. We first note that padded convolution provides the advantage of model storage saving, but it does not support output channel packing, thereby increasing the amount of computation and communication. We address this limitation by proposing a novel plaintext multiplication algorithm using the Walsh-Hadamard matrix. Furthermore, we propose the optimization techniques to significantly reduce the latency of the proposed convolution by selecting the optimal encryption parameters and applying lazy reduction. Overall, Hyena achieves 1.6-3.8x speedup and reduces the weight storage by 2000-8000x compared to the conventional convolution. For deep CNNs like VGG-16, ResNet-20, and MobileNetV1 on ImageNet, Hyena reduces the end-to-end latency by 1.3-2.5x, the memory usage by 2.1-7.9x and communication cost by 1.4-1.5x compared to conventional method.
Paper Structure (22 sections, 4 equations, 9 figures, 4 tables)

This paper contains 22 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Latency breakdown between linear vs. nonlinear layers for PI with VGG-16 and ResNet-18 on TinyImagenet garimella2023characterizing. Processing convolution layers occupies more than 85 % of the total end-to-end latency.
  • Figure 2: HE convolutions with multiple input and output channels: (a) conventional (packed) and (b) padded convolution. For $f$, the subscript denotes the weight scalar order, while the superscripts specify the output and input channels, respectively.
  • Figure 3: Proposed convolution with multiple input and output channels enabling channel packing.
  • Figure 4: Bit width of ciphertext coefficients during HE convolution. $result$ is to store each multiplication output, and $sum$ is to accumulate the partial sums to obtain the final convolution result. $\llbracket \textbf{0}\rrbracket$ denotes an empty ciphertext. Blue areas represent HE multiplication operations ($\textbf{PMult}$ or $\textbf{CMult}^{-}$), and green areas represent HE addition operations ($\textbf{HAdd}$ or $\textbf{HAdd}^{128}$): (a) conventional convolution, and (b) proposed convolution with lazy reduction.
  • Figure 5: Normalized convolution latency where the tuple represents (input channel width, input channel count, output channel count): (a) conventional convolution (baseline), (b) conventional convolution without hoisting technique, (c) proposed convolution without latency optimization, (d) proposed convolution with optimal parameter selection only, and (e) proposed convolution with all optimization techniques.
  • ...and 4 more figures