Table of Contents
Fetching ...

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, Jong-Seon No

TL;DR

The paper addresses private inference on FHE by tackling the latency bottleneck of post-training activation approximations. It introduces Optimized Layerwise Approximation (OLA), which assigns per-layer polynomial degrees to activation approximations using a distribution-aware weighted least-squares framework, and solves the degree allocation via dynamic programming under a runtime constraint. The approach yields substantial latency reductions (e.g., ≈3.0× for ResNet-20 and ≈2.8× for ResNet-32 on ciphertexts) while preserving accuracy, and demonstrates effective GELU approximation for ConvNeXt with degree-3 polynomials on CIFAR-10/100 and ImageNet. The framework includes practical steps for degree search space design, discretization, and a scaled distribution model to handle low-probability regions, pointing to broader applicability to transformers and LLMs in future work.

Abstract

Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

TL;DR

The paper addresses private inference on FHE by tackling the latency bottleneck of post-training activation approximations. It introduces Optimized Layerwise Approximation (OLA), which assigns per-layer polynomial degrees to activation approximations using a distribution-aware weighted least-squares framework, and solves the degree allocation via dynamic programming under a runtime constraint. The approach yields substantial latency reductions (e.g., ≈3.0× for ResNet-20 and ≈2.8× for ResNet-32 on ciphertexts) while preserving accuracy, and demonstrates effective GELU approximation for ConvNeXt with degree-3 polynomials on CIFAR-10/100 and ImageNet. The framework includes practical steps for degree search space design, discretization, and a scaled distribution model to handle low-probability regions, pointing to broader applicability to transformers and LLMs in future work.

Abstract

Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.
Paper Structure (24 sections, 2 theorems, 12 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 12 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Let $\phi(x)$ be an arbitrary input distribution, and $\mathcal{F}_\phi$ be a function space $\{ g| \int_\mathbb{R} \phi (x) |g(x)|^2 dx < \infty \}$. Assume that every polynomial and a function $f(x)$ are elements of $\mathcal{F}_\phi$.

Figures (4)

  • Figure 1: The runtime of the polynomial evaluation (blue, $T_1(\cdot)$), and the sum of the polynomial evaluation runtime and bootstrapping runtime (orange, $T_i(\cdot)$, $i\geq 2$) on the RNS-CKKS scheme.
  • Figure 2: Accuracy graph as a function of $r$ (with CIFAR-10 dataset).
  • Figure 3: Comparison of runtime between maintaining the modulus and moduli-chain managing methods, the increasing function of the depth consumption $\delta$.
  • Figure 4: Graph of the logarithm of the loss for the approximation region $R$ according to changes in the value of $r$.

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof