Design and Optimization of Cloud Native Homomorphic Encryption Workflows for Privacy-Preserving ML Inference
Tejaswini Bollikonda
TL;DR
Addresses privacy during cloud ML inference by using Homomorphic Encryption (HE) to compute on encrypted inputs. Proposes a cloud-native HE workflow that integrates HE modules with Kubernetes to enable elastic, automated orchestration. Key contributions include a modular cloud-native HE architecture, optimization techniques such as ciphertext packing, polynomial modulus switching, and operator fusion, and experimental validation showing up to $3.2\times$ inference acceleration and $40\%$ memory reduction, with latency improvements up to $69\%$ and accuracy deviation below $0.2\%$. The work demonstrates the practicality of privacy-preserving MLaaS in zero-trust cloud environments and provides a scalable blueprint for secure AI deployments.
Abstract
As machine learning (ML) models become increasingly deployed through cloud infrastructures, the confidentiality of user data during inference poses a significant security challenge. Homomorphic Encryption (HE) has emerged as a compelling cryptographic technique that enables computation on encrypted data, allowing predictions to be generated without decrypting sensitive inputs. However, the integration of HE within large scale cloud native pipelines remains constrained by high computational overhead, orchestration complexity, and model compatibility issues. This paper presents a systematic framework for the design and optimization of cloud native homomorphic encryption workflows that support privacy-preserving ML inference. The proposed architecture integrates containerized HE modules with Kubernetes-based orchestration, enabling elastic scaling and parallel encrypted computation across distributed environments. Furthermore, optimization strategies including ciphertext packing, polynomial modulus adjustment, and operator fusion are employed to minimize latency and resource consumption while preserving cryptographic integrity. Experimental results demonstrate that the proposed system achieves up to 3.2times inference acceleration and 40% reduction in memory utilization compared to conventional HE pipelines. These findings illustrate a practical pathway for deploying secure ML-as-a-Service (MLaaS) systems that guarantee data confidentiality under zero-trust cloud conditions.
