Table of Contents
Fetching ...

Safeguarding LLM Embeddings in End-Cloud Collaboration via Entropy-Driven Perturbation

Shuaifan Jin, Xiaoyi Pang, Zhibo Wang, He Wang, Jiacheng Du, Jiahui Hu, Kui Ren

TL;DR

This work tackles privacy risks in Retrieval-Augmented Generation by protecting end-device embeddings from Embedding Inversion Attacks without cloud-side changes. It introduces EntroGuard, a plug-in that combines Entropy-based Perturbation Generation and Bound-aware Perturbation Adaptation to disrupt learning-based EIAs by steering recovery toward meaningless content while preserving retrieval accuracy within a bounded perturbation. A new semantic-privacy metric BiNLI complements traditional text-based measures to capture semantic leakage. Extensive experiments across multiple datasets and embedding/recovery models show EntroGuard achieves up to about 8x reductions in privacy leakage with negligible retrieval degradation and feasible on-device overhead, proving practical for real-world end-cloud collaboration. The approach is model-agnostic to embedding architectures and robust against various attack models, making it a scalable solution for safeguarding user privacy in end-user LLM reasoning.

Abstract

Recent studies improve on-device language model (LM) inference through end-cloud collaboration, where the end device retrieves useful information from cloud databases to enhance local processing, known as Retrieval-Augmented Generation (RAG). Typically, to retrieve information from the cloud while safeguarding privacy, the end device transforms original data into embeddings with a local embedding model. However, the recently emerging Embedding Inversion Attacks (EIAs) can still recover the original data from text embeddings (e.g., training a recovery model to map embeddings back to original texts), posing a significant threat to user privacy. To address this risk, we propose EntroGuard, an entropy-driven perturbation-based embedding privacy protection method, which can protect the privacy of text embeddings while maintaining retrieval accuracy during the end-cloud collaboration. Specifically, to defeat various EIAs, we perturb the embeddings to increase the entropy of the recovered text in the common structure of recovery models, thus steering the embeddings toward meaningless texts rather than original sensitive texts during the recovery process. To maintain retrieval performance in the cloud, we constrain the perturbations within a bound, applying the strategy of reducing them where redundant and increasing them where sparse. Moreover, EntroGuard can be directly integrated into end devices without requiring any modifications to the embedding model. Extensive experimental results demonstrate that EntroGuard can reduce the risk of privacy leakage by up to 8 times at most with negligible loss of retrieval performance compared to existing privacy-preserving methods.

Safeguarding LLM Embeddings in End-Cloud Collaboration via Entropy-Driven Perturbation

TL;DR

This work tackles privacy risks in Retrieval-Augmented Generation by protecting end-device embeddings from Embedding Inversion Attacks without cloud-side changes. It introduces EntroGuard, a plug-in that combines Entropy-based Perturbation Generation and Bound-aware Perturbation Adaptation to disrupt learning-based EIAs by steering recovery toward meaningless content while preserving retrieval accuracy within a bounded perturbation. A new semantic-privacy metric BiNLI complements traditional text-based measures to capture semantic leakage. Extensive experiments across multiple datasets and embedding/recovery models show EntroGuard achieves up to about 8x reductions in privacy leakage with negligible retrieval degradation and feasible on-device overhead, proving practical for real-world end-cloud collaboration. The approach is model-agnostic to embedding architectures and robust against various attack models, making it a scalable solution for safeguarding user privacy in end-user LLM reasoning.

Abstract

Recent studies improve on-device language model (LM) inference through end-cloud collaboration, where the end device retrieves useful information from cloud databases to enhance local processing, known as Retrieval-Augmented Generation (RAG). Typically, to retrieve information from the cloud while safeguarding privacy, the end device transforms original data into embeddings with a local embedding model. However, the recently emerging Embedding Inversion Attacks (EIAs) can still recover the original data from text embeddings (e.g., training a recovery model to map embeddings back to original texts), posing a significant threat to user privacy. To address this risk, we propose EntroGuard, an entropy-driven perturbation-based embedding privacy protection method, which can protect the privacy of text embeddings while maintaining retrieval accuracy during the end-cloud collaboration. Specifically, to defeat various EIAs, we perturb the embeddings to increase the entropy of the recovered text in the common structure of recovery models, thus steering the embeddings toward meaningless texts rather than original sensitive texts during the recovery process. To maintain retrieval performance in the cloud, we constrain the perturbations within a bound, applying the strategy of reducing them where redundant and increasing them where sparse. Moreover, EntroGuard can be directly integrated into end devices without requiring any modifications to the embedding model. Extensive experimental results demonstrate that EntroGuard can reduce the risk of privacy leakage by up to 8 times at most with negligible loss of retrieval performance compared to existing privacy-preserving methods.

Paper Structure

This paper contains 30 sections, 8 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: In end-cloud collaboration, end users query the vector database in the cloud for external knowledge to obtain more convincing responses.
  • Figure 2: Pipeline of EntroGuard where the dashed arrows indicate the training process and the realization arrows indicate the inference process. During the training phase, a surrogate attacker model was built for optimizing perturbation generator in Entropy-based Perturbation Generation. In the inference phase, the original text is converted into an embedding through the embedding model and then processed by EntroGuard, including Entropy-based Perturbation Generation, Bound-aware Perturbation Adaptation, resulting in a protected embedding.
  • Figure 3: The internal recovery process of EIA, the top is the input, the bottom is the output, and vertical axis represents the Transformer block. The darker the color means the higher the confidence of the generated results, and the words marked with * indicate that the prediction is correct.
  • Figure 4: The process of generating intermediate results of each layer of transformer blocks, where the intermediate results converge towards meaningless words via Entropy-based Perturbation Generation.
  • Figure 5: The retrieval performance on Fever dataset.
  • ...and 3 more figures