Table of Contents
Fetching ...

SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models

Jiawen Zhang, Kejia Chen, Zunlei Feng, Jian Lou, Mingli Song, Jian Liu, Xiaohu Yang

TL;DR

SecPE tackles privacy-preserving and adversarial robustness concerns in large language models by integrating private inference with prompt ensembling. It introduces an efficient Argmax operation in the RNS-CKKS fully homomorphic encryption setting, using a polynomial approximation of Sign and rotations by powers of two to reduce complexity from $O(n)$ to $O(\log n)$ and enable batching. Empirical results across NLP, arithmetic reasoning, and text-image tasks show SecPE preserves high accuracy and improves robustness with only modest overhead, while achieving substantial Argmax speedups (e.g., ~20.8x for $n=256$) compared to prior private-Argmax methods. The approach demonstrates practical potential for privacy-preserving, robust LLM services and points to hardware acceleration as a promising avenue for further performance gains, expanding secure MLaaS capabilities.

Abstract

With the growing popularity of LLMs among the general public users, privacy-preserving and adversarial robustness have become two pressing demands for LLM-based services, which have largely been pursued separately but rarely jointly. In this paper, to the best of our knowledge, we are among the first attempts towards robust and private LLM inference by tightly integrating two disconnected fields: private inference and prompt ensembling. The former protects users' privacy by encrypting inference data transmitted and processed by LLMs, while the latter enhances adversarial robustness by yielding an aggregated output from multiple prompted LLM responses. Although widely recognized as effective individually, private inference for prompt ensembling together entails new challenges that render the naive combination of existing techniques inefficient. To overcome the hurdles, we propose SecPE, which designs efficient fully homomorphic encryption (FHE) counterparts for the core algorithmic building blocks of prompt ensembling. We conduct extensive experiments on 8 tasks to evaluate the accuracy, robustness, and efficiency of SecPE. The results show that SecPE maintains high clean accuracy and offers better robustness at the expense of merely $2.5\%$ efficiency overhead compared to baseline private inference methods, indicating a satisfactory ``accuracy-robustness-efficiency'' tradeoff. For the efficiency of the encrypted Argmax operation that incurs major slowdown for prompt ensembling, SecPE is 35.4x faster than the state-of-the-art peers, which can be of independent interest beyond this work.

SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models

TL;DR

SecPE tackles privacy-preserving and adversarial robustness concerns in large language models by integrating private inference with prompt ensembling. It introduces an efficient Argmax operation in the RNS-CKKS fully homomorphic encryption setting, using a polynomial approximation of Sign and rotations by powers of two to reduce complexity from to and enable batching. Empirical results across NLP, arithmetic reasoning, and text-image tasks show SecPE preserves high accuracy and improves robustness with only modest overhead, while achieving substantial Argmax speedups (e.g., ~20.8x for ) compared to prior private-Argmax methods. The approach demonstrates practical potential for privacy-preserving, robust LLM services and points to hardware acceleration as a promising avenue for further performance gains, expanding secure MLaaS capabilities.

Abstract

With the growing popularity of LLMs among the general public users, privacy-preserving and adversarial robustness have become two pressing demands for LLM-based services, which have largely been pursued separately but rarely jointly. In this paper, to the best of our knowledge, we are among the first attempts towards robust and private LLM inference by tightly integrating two disconnected fields: private inference and prompt ensembling. The former protects users' privacy by encrypting inference data transmitted and processed by LLMs, while the latter enhances adversarial robustness by yielding an aggregated output from multiple prompted LLM responses. Although widely recognized as effective individually, private inference for prompt ensembling together entails new challenges that render the naive combination of existing techniques inefficient. To overcome the hurdles, we propose SecPE, which designs efficient fully homomorphic encryption (FHE) counterparts for the core algorithmic building blocks of prompt ensembling. We conduct extensive experiments on 8 tasks to evaluate the accuracy, robustness, and efficiency of SecPE. The results show that SecPE maintains high clean accuracy and offers better robustness at the expense of merely efficiency overhead compared to baseline private inference methods, indicating a satisfactory ``accuracy-robustness-efficiency'' tradeoff. For the efficiency of the encrypted Argmax operation that incurs major slowdown for prompt ensembling, SecPE is 35.4x faster than the state-of-the-art peers, which can be of independent interest beyond this work.

Paper Structure

This paper contains 14 sections, 10 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: A high-level overview of SecPE for private and robust LLM inference in FHE-based MLaaS.
  • Figure 2: An illustration of secPE, which enables homomorphically encrypted LLM inference with guarantees.
  • Figure 3: Example run of Algorithm \ref{['alg:argmax']}.
  • Figure 4: Performance on GSM8K with the different number of reasoning paths.
  • Figure 5: Performance on MultiArith with the different number of reasoning paths.
  • ...and 2 more figures