Table of Contents
Fetching ...

Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)

Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong

TL;DR

AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems and carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service.

Abstract

The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied simultaneously: (1) Accuracy and efficiency losses should be minimized to mitigate degradation in service experience. (2) The inference process can be run on large-scale clusters consist of heterogeneous legacy xPUs. (3) Compatibility with existing LLM infrastructures should be ensured to reuse their engineering optimizations. To the best of our knowledge, none of the existing privacy-preserving LLM inference methods satisfy all the above constraints while delivering meaningful privacy guarantees. In this paper, we propose AloePri, the first privacy-preserving LLM inference method for industrial applications. AloePri protects both the input and output data by covariant obfuscation, which jointly transforms data and model parameters to achieve better accuracy and privacy. We carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service. AloePri has been integrated into an industrial system for the evaluation of mainstream LLMs. The evaluation on Deepseek-V3.1-Terminus model (671B parameters) demonstrates that AloePri causes accuracy loss of 0.0%~3.5% and exhibits efficiency equivalent to that of plaintext inference. Meanwhile, AloePri successfully resists state-of-the-art attacks, with less than 5\% of tokens recovered. To the best of our knowledge, AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems.

Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)

TL;DR

AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems and carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service.

Abstract

The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied simultaneously: (1) Accuracy and efficiency losses should be minimized to mitigate degradation in service experience. (2) The inference process can be run on large-scale clusters consist of heterogeneous legacy xPUs. (3) Compatibility with existing LLM infrastructures should be ensured to reuse their engineering optimizations. To the best of our knowledge, none of the existing privacy-preserving LLM inference methods satisfy all the above constraints while delivering meaningful privacy guarantees. In this paper, we propose AloePri, the first privacy-preserving LLM inference method for industrial applications. AloePri protects both the input and output data by covariant obfuscation, which jointly transforms data and model parameters to achieve better accuracy and privacy. We carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service. AloePri has been integrated into an industrial system for the evaluation of mainstream LLMs. The evaluation on Deepseek-V3.1-Terminus model (671B parameters) demonstrates that AloePri causes accuracy loss of 0.0%~3.5% and exhibits efficiency equivalent to that of plaintext inference. Meanwhile, AloePri successfully resists state-of-the-art attacks, with less than 5\% of tokens recovered. To the best of our knowledge, AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems.
Paper Structure (58 sections, 22 theorems, 81 equations, 6 figures, 9 tables, 2 algorithms)

This paper contains 58 sections, 22 theorems, 81 equations, 6 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Let $f: X \times \Theta \rightarrow Y$ denote a LLM inference function (where $X=\mathbb{Z}^l_n$ and $Y = \mathbb{Z}_n$ ) and $D := \phi_{X}(x)$ denote a data-only obfuscation mechanism. There always exists a covariant obfuscation mechanism $C:= (\phi_X, \phi_\Theta, \phi_Y, \psi_Y, \tilde{f})$ sati

Figures (6)

  • Figure 1: Workflow of AloePri. The client locally obfuscates the model and deploys it to the server, and handles prompt obfuscation and response de-obfuscation during online phase.
  • Figure 2: Overview of AloePri. In the offline model obfuscation process, token-level permutation and linear transformations are employed to construct the obfuscations $\phi^{\text{embed}}, \phi^{\text{head}}, \phi^{\text{attn}},$ and $\phi^{\text{ffn}}$. In the online inference process, a secret vocabulary mapping associated with the permutation is used to construct the data obfuscation $\phi_X$ and de-obfuscation $\psi_Y$.
  • Figure 3: Privacy (TTRSR) and accuracy under various noise parameters on Qwen2.5-14B-Instruct and C-Eval.
  • Figure 4: Impact of $\lambda$ on Qwen2.5-14B-Instruct.
  • Figure 5: TPOT and TTFT vs. $h$ on R1-Distill-14B
  • ...and 1 more figures

Theorems & Definitions (44)

  • Theorem 1
  • proof
  • Theorem 2: Sequential Composition Theorem $f \circ g$
  • Theorem 3: Parallel Composition Theorem $f || g$
  • Theorem 4: Summation Composition Theorem $f + g$
  • Theorem 5: Information Leakage Bound
  • proof
  • Theorem 6: Static Attack Bound
  • proof
  • proof
  • ...and 34 more