Table of Contents
Fetching ...

PAPER: Privacy-Preserving Convolutional Neural Networks using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE

Eduardo Chielle, Manaar Alam, Jinting Liu, Jovan Kascelan, Michail Maniatakos

TL;DR

This work tackles the challenge of privacy-preserving CNN inference under leveled FHE (LFHE) by drastically reducing multiplicative depth and avoiding bootstrapping. It introduces a quadratic activation with a penalty-based training regime to achieve the theoretical minimum depth for nonlinear activations, complemented by structural optimizations (Node Fusing, Weight Redistribution, Tower Reuse) and co-design techniques (data layout, slice/ensemble clustering) to enable deep models like ResNet-32 under LFHE. A key contribution is enabling ensemble polynomial networks within a single ciphertext via shared codebooks, recovering accuracy lost to low-degree polynomials. Empirically, the approach yields up to 4× faster private inference on CIFAR and Tiny-ImageNet than prior methods, with accuracy close to plaintext ReLU models, marking a significant step toward practical PPML deployments using LFHE.

Abstract

Recent work using Fully Homomorphic Encryption (FHE) has made non-interactive privacy-preserving inference of deep Convolutional Neural Networks (CNN) possible. However, the performance of these methods remain limited by their heavy reliance on bootstrapping, a costly FHE operation applied across multiple layers, severely slowing inference. Moreover, they depend on high-degree polynomial approximations of non-linear activations, which increase multiplicative depth and reduce accuracy by 2-5% compared to plaintext ReLU models. In this work, we close the accuracy gap between FHE-based non-interactive CNNs and their plaintext counterparts, while also achieving faster inference than existing methods. We propose a quadratic polynomial approximation of ReLU, which achieves the theoretical minimum multiplicative depth for non-linear activations, together with a penalty-based training strategy. We further introduce structural optimizations that reduce the required FHE levels in CNNs by a factor of five compared to prior work, allowing us to run deep CNN models under leveled FHE without bootstrapping. To further accelerate inference and recover accuracy typically lost with polynomial approximations, we introduce parameter clustering along with a joint strategy of data layout and ensemble techniques. Experiments with VGG and ResNet models on CIFAR and Tiny-ImageNet datasets show that our approach achieves up to $4\times$ faster private inference than prior work, with accuracy comparable to plaintext ReLU models.

PAPER: Privacy-Preserving Convolutional Neural Networks using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE

TL;DR

This work tackles the challenge of privacy-preserving CNN inference under leveled FHE (LFHE) by drastically reducing multiplicative depth and avoiding bootstrapping. It introduces a quadratic activation with a penalty-based training regime to achieve the theoretical minimum depth for nonlinear activations, complemented by structural optimizations (Node Fusing, Weight Redistribution, Tower Reuse) and co-design techniques (data layout, slice/ensemble clustering) to enable deep models like ResNet-32 under LFHE. A key contribution is enabling ensemble polynomial networks within a single ciphertext via shared codebooks, recovering accuracy lost to low-degree polynomials. Empirically, the approach yields up to 4× faster private inference on CIFAR and Tiny-ImageNet than prior methods, with accuracy close to plaintext ReLU models, marking a significant step toward practical PPML deployments using LFHE.

Abstract

Recent work using Fully Homomorphic Encryption (FHE) has made non-interactive privacy-preserving inference of deep Convolutional Neural Networks (CNN) possible. However, the performance of these methods remain limited by their heavy reliance on bootstrapping, a costly FHE operation applied across multiple layers, severely slowing inference. Moreover, they depend on high-degree polynomial approximations of non-linear activations, which increase multiplicative depth and reduce accuracy by 2-5% compared to plaintext ReLU models. In this work, we close the accuracy gap between FHE-based non-interactive CNNs and their plaintext counterparts, while also achieving faster inference than existing methods. We propose a quadratic polynomial approximation of ReLU, which achieves the theoretical minimum multiplicative depth for non-linear activations, together with a penalty-based training strategy. We further introduce structural optimizations that reduce the required FHE levels in CNNs by a factor of five compared to prior work, allowing us to run deep CNN models under leveled FHE without bootstrapping. To further accelerate inference and recover accuracy typically lost with polynomial approximations, we introduce parameter clustering along with a joint strategy of data layout and ensemble techniques. Experiments with VGG and ResNet models on CIFAR and Tiny-ImageNet datasets show that our approach achieves up to faster private inference than prior work, with accuracy comparable to plaintext ReLU models.

Paper Structure

This paper contains 27 sections, 2 theorems, 39 equations, 8 figures, 2 tables.

Key Result

Lemma 1

Let $z_p^{(l)} \;=\; W^{(l)}\;h^{(l-1)}_{p_d}(x) \in \mathbb{R}^{n_l}$ denote the pre-activation vector at layer $l$ for input $x$. We define two quantities based on this vector: the clipping residual which measures the amount by which $z_p^{(l)}$ exceeds the clipping range, and the gradient of the cross-entropy loss with respect to the pre-activation which captures the sensitivity of the loss t

Figures (8)

  • Figure 1: Overview of the two-stage training pipeline of a neural network with polynomial approximation. A ReLU network is approximated using quantization-aware polynomial fitting to obtain an initial polynomial network. The polynomial network is then trained through regularized polynomial network training, consisting of activation regularization, pre-activation clipping, and warm-up scheduling, to ensure stable convergence.
  • Figure 2: Illustration of node fusing cases. Nodes are batch normalization ( ), convolution ( ), univariate ( ) and bivariate ( ) polynomial activations, and addition ( ).
  • Figure 3: Illustration of weight redistribution cases. represents a donor and a receiver. The number of bars ($\bar{\cdot}$) on a or represents the number of updates the node received. $\mathbf{f}$ represents a forward and $\mathbf{b}$ a backward update.
  • Figure 4: Illustration of HW layout for convolution on a $3 \times 4 \times 4$ input and a $3 \times 2 \times 2$ filter, configured with padding 0 and stride 1.
  • Figure 5: Illustration of single model slice clustering. Convolution filters are decomposed into slices along the kernel width, and each slice is clustered independently with its own codebook.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Lemma 1: Pre-activation Update Decomposition
  • Lemma 2: Clipping Gradient Pullback