Table of Contents
Fetching ...

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

Ziqian Zeng, Jianwei Wang, Junyao Yang, Zhengdong Lu, Haoran Li, Huiping Zhuang, Cen Chen

TL;DR

PrivacyRestore tackles privacy leakage in online LLM inferences by removing privacy spans from user inputs and restoring content during inference via activation steering. The method trains restoration vectors for a core set of privacy-span types on the server, constructs a meta vector through Attention-aware Weighted Aggregation, and protects the meta vector with $d_\chi$-privacy to keep the privacy budget at $2\epsilon$ independent of input length. Empirical results on three privacy-preserving datasets in medical and legal domains show strong privacy protection with only modest inference overhead, outperforming DP-based and paraphrase baselines in both input/output privacy and utility metrics. The approach offers a practical, plug-and-play privacy solution for large-scale LLM inference services with favorable privacy-utility trade-offs and scalable privacy budgeting.

Abstract

The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to malicious eavesdroppers. Existing privacy protection methods for LLMs suffer from either insufficient privacy protection, performance degradation, or large inference time overhead. To address these limitations, we propose PrivacyRestore, a plug-and-play method to protect the privacy of user inputs during LLM inference. The server first trains restoration vectors for each privacy span and then release to clients. Privacy span is defined as a contiguous sequence of tokens within a text that contain private information. The client then aggregate restoration vectors of all privacy spans in the input into a single meta restoration vector which is later sent to the server side along with the input without privacy spans.The private information is restored via activation steering during inference. Furthermore, we prove that PrivacyRestore inherently prevents the linear growth of the privacy budget.We create three datasets, covering medical and legal domains, to evaluate the effectiveness of privacy preserving methods. The experimental results show that PrivacyRestore effectively protects private information and maintain acceptable levels of performance and inference overhead.

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

TL;DR

PrivacyRestore tackles privacy leakage in online LLM inferences by removing privacy spans from user inputs and restoring content during inference via activation steering. The method trains restoration vectors for a core set of privacy-span types on the server, constructs a meta vector through Attention-aware Weighted Aggregation, and protects the meta vector with -privacy to keep the privacy budget at independent of input length. Empirical results on three privacy-preserving datasets in medical and legal domains show strong privacy protection with only modest inference overhead, outperforming DP-based and paraphrase baselines in both input/output privacy and utility metrics. The approach offers a practical, plug-and-play privacy solution for large-scale LLM inference services with favorable privacy-utility trade-offs and scalable privacy budgeting.

Abstract

The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to malicious eavesdroppers. Existing privacy protection methods for LLMs suffer from either insufficient privacy protection, performance degradation, or large inference time overhead. To address these limitations, we propose PrivacyRestore, a plug-and-play method to protect the privacy of user inputs during LLM inference. The server first trains restoration vectors for each privacy span and then release to clients. Privacy span is defined as a contiguous sequence of tokens within a text that contain private information. The client then aggregate restoration vectors of all privacy spans in the input into a single meta restoration vector which is later sent to the server side along with the input without privacy spans.The private information is restored via activation steering during inference. Furthermore, we prove that PrivacyRestore inherently prevents the linear growth of the privacy budget.We create three datasets, covering medical and legal domains, to evaluate the effectiveness of privacy preserving methods. The experimental results show that PrivacyRestore effectively protects private information and maintain acceptable levels of performance and inference overhead.
Paper Structure (106 sections, 1 theorem, 15 equations, 17 figures, 17 tables, 1 algorithm)

This paper contains 106 sections, 1 theorem, 15 equations, 17 figures, 17 tables, 1 algorithm.

Key Result

Theorem 5.1

PrivacyRestore fulfills $d_{\chi}$-privacy and provides a privacy budget of $2\epsilon$, where $\epsilon$ denotes privacy hyperparameter. The privacy budget of PrivacyRestore is independent of the length of the protected text.

Figures (17)

  • Figure 1: The PrivacyRestore consists of two stages. (1) Preparation Stage. This stage aims to identify the edited heads and train the restoration vectors. We provide a more detailed training set example in Figure \ref{['fig:training_samples']}. (2) Inference Stage. In this stage, the client constructs a meta vector. The server uses the meta vector to restore information during inference on the incomplete query.
  • Figure 2: Results of embedding inverse attack and attribute inference attack for all baselines under different privacy hyperparameters $\epsilon$ on Pri-DDXPlus.
  • Figure 3: (a) and (b) present the results of $d_\chi$-privacy method under the prompt injection attack and attribute inference attack under varying $d_\chi$-privacy percentages across three privacy-preserving datasets. (c) and (d) show the results of PrivacyRestore for the embedding inverse attack and attribute inference attack under different privacy span ratios $\alpha$ on the same three datasets.
  • Figure 4: A training sample in our framework. Text highlighted with a yellow background represents the privacy spans in user inputs. Text highlighted with a green background indicates the correct diagnosis. Text highlighted with a red background denotes the incorrect diagnosis.
  • Figure 5: A rewrite example displays the diversity enhancement in medical queries. Text highlighted with green background indicates medical history, while yellow background denotes symptoms.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Theorem 5.1
  • Definition C.1