Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park, Jung Woo Kim, Hyunsik Chae, Ernest K. Ryu, Jung Hee Cheon
TL;DR
This paper tackles privacy concerns in personalized LLM interactions by enabling private fine-tuning and inference using homomorphic encryption. It introduces an HE-friendly transformer that combines LoRA-based fine-tuning and Gaussian kernel attention to bypass hard non-polynomial operations, achieving substantial speedups over prior encrypted transformers. Empirical results on a BERT-style encoder demonstrate 6.94x faster fine-tuning and 2.3x faster inference with negligible accuracy loss on downstream tasks, highlighting the practicality of privacy-preserving LLM services. The work lays a foundation for scalable encrypted NLP pipelines and points to future work on encryption-aware training and efficient HE primitives.
Abstract
Large language models (LLMs) offer personalized responses based on user interactions, but this use case raises serious privacy concerns. Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states and provides a potential solution for privacy-preserving machine learning (PPML). However, the computational intensity of transformers poses challenges for applying HE to LLMs. In this work, we propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning. Utilizing LoRA fine-tuning and Gaussian kernels, we achieve significant computational speedups -- 6.94x for fine-tuning and 2.3x for inference -- while maintaining performance comparable to plaintext models. Our findings provide a viable proof of concept for offering privacy-preserving LLM services in areas where data protection is crucial. Our code is available on GitHub.
