Table of Contents
Fetching ...

FastFHE: Packing-Scalable and Depthwise-Separable CNN Inference Over FHE

Wenbo Song, Xinxin Fan, Quanliang Jing, Shaoye Luo, Wenqi Wei, Chi Lin, Yunfeng Lu, Ling Liu

TL;DR

FastFHE addresses the practical bottlenecks of encrypted CNN inference by introducing a scalable N×N block ciphertext packing scheme, block-ciphertext depthwise-separable multi-channel convolution, ConvBN fusion to minimize multiplicative-depth, and a low-degree Legendre polynomial to approximate SiLU. The approach yields substantial reductions in convolution and rotation overhead, lowers bootstrapping frequency, and maintains near-plaintext accuracy on CIFAR-10 across ResNet and VGG architectures. Empirical results show up to ~3× faster amortized runtime compared with state-of-the-art PPML methods while using far fewer cryptographic resources. These contributions collectively push toward practical privacy-preserving MLaaS with secure, efficient encrypted inference at scale.

Abstract

The deep learning (DL) has been penetrating daily life in many domains, how to keep the DL model inference secure and sample privacy in an encrypted environment has become an urgent and increasingly important issue for various security-critical applications. To date, several approaches have been proposed based on the Residue Number System variant of the Cheon-Kim-Kim-Song (RNS-CKKS) scheme. However, they all suffer from high latency, which severely limits the applications in real-world tasks. Currently, the research on encrypted inference in deep CNNs confronts three main bottlenecks: i) the time and storage costs of convolution calculation; ii) the time overhead of huge bootstrapping operations; and iii) the consumption of circuit multiplication depth. Towards these three challenges, we in this paper propose an efficient and effective mechanism FastFHE to accelerate the model inference while simultaneously retaining high inference accuracy over fully homomorphic encryption. Concretely, our work elaborates four unique novelties. First, we propose a new scalable ciphertext data-packing scheme to save the time and storage consumptions. Second, we work out a depthwise-separable convolution fashion to degrade the computation load of convolution calculation. Third, we figure out a BN dot-product fusion matrix to merge the ciphertext convolutional layer with the batch-normalization layer without incurring extra multiplicative depth. Last but not least, we adopt the low-degree Legendre polynomial to approximate the nonlinear smooth activation function SiLU under the guarantee of tiny accuracy error before and after encrypted inference. Finally, we execute multi-facet experiments to verify the efficiency and effectiveness of our proposed approach.

FastFHE: Packing-Scalable and Depthwise-Separable CNN Inference Over FHE

TL;DR

FastFHE addresses the practical bottlenecks of encrypted CNN inference by introducing a scalable N×N block ciphertext packing scheme, block-ciphertext depthwise-separable multi-channel convolution, ConvBN fusion to minimize multiplicative-depth, and a low-degree Legendre polynomial to approximate SiLU. The approach yields substantial reductions in convolution and rotation overhead, lowers bootstrapping frequency, and maintains near-plaintext accuracy on CIFAR-10 across ResNet and VGG architectures. Empirical results show up to ~3× faster amortized runtime compared with state-of-the-art PPML methods while using far fewer cryptographic resources. These contributions collectively push toward practical privacy-preserving MLaaS with secure, efficient encrypted inference at scale.

Abstract

The deep learning (DL) has been penetrating daily life in many domains, how to keep the DL model inference secure and sample privacy in an encrypted environment has become an urgent and increasingly important issue for various security-critical applications. To date, several approaches have been proposed based on the Residue Number System variant of the Cheon-Kim-Kim-Song (RNS-CKKS) scheme. However, they all suffer from high latency, which severely limits the applications in real-world tasks. Currently, the research on encrypted inference in deep CNNs confronts three main bottlenecks: i) the time and storage costs of convolution calculation; ii) the time overhead of huge bootstrapping operations; and iii) the consumption of circuit multiplication depth. Towards these three challenges, we in this paper propose an efficient and effective mechanism FastFHE to accelerate the model inference while simultaneously retaining high inference accuracy over fully homomorphic encryption. Concretely, our work elaborates four unique novelties. First, we propose a new scalable ciphertext data-packing scheme to save the time and storage consumptions. Second, we work out a depthwise-separable convolution fashion to degrade the computation load of convolution calculation. Third, we figure out a BN dot-product fusion matrix to merge the ciphertext convolutional layer with the batch-normalization layer without incurring extra multiplicative depth. Last but not least, we adopt the low-degree Legendre polynomial to approximate the nonlinear smooth activation function SiLU under the guarantee of tiny accuracy error before and after encrypted inference. Finally, we execute multi-facet experiments to verify the efficiency and effectiveness of our proposed approach.

Paper Structure

This paper contains 28 sections, 14 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: MLaaS over FHE. The dashed box highlights the homomorphic operations in our FastFHE.
  • Figure 2: Packing image into ciphertext slots, wherein $a_{i}$, $b_{i}$, and $c_{i}$ denote feature values respectively from three channels. Any unused store slots are filled with zeros.
  • Figure 3: Scalable block-oriented packing scheme.
  • Figure 4: Architectures of traditional convolution vs. depthwise-separable convolution.
  • Figure 5: Block-ciphertext depthwise-separable multi-channel convolution.
  • ...and 8 more figures