Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
Guillem Ramírez, Alexandra Birch, Ivan Titov
TL;DR
The paper tackles privacy risks in API-driven LLM usage by introducing privacy profiles that let users specify what can be shared with external models. A two-tiered system with a local model paraphrases user queries, which are then optionally answered by a more capable external LLM, with an aggregator reconciling outputs. The authors present PEEP, a multilingual dataset of 15,282 real queries with synthetic privacy profiles, and show that fine-tuned lightweight LLMs can match or surpass larger zero-shot models in both privacy protection and task performance, though some leakage persists. The work highlights practical pathways for privacy-preserving LLM use and emphasizes the need for better instruction understanding of user-defined privacy preferences and further reductions in leakage in real-world deployments.
Abstract
Large language models (LLMs) are primarily accessed via commercial APIs, but this often requires users to expose their data to service providers. In this paper, we explore how users can stay in control of their data by using privacy profiles: simple natural language instructions that say what should and should not be revealed. We build a framework where a local model uses these instructions to rewrite queries, only hiding details deemed sensitive by the user, before sending them to an external model, thus balancing privacy with performance. To support this research, we introduce PEEP, a multilingual dataset of real user queries annotated to mark private content and paired with synthetic privacy profiles. Experiments with lightweight local LLMs show that, after fine-tuning, they not only achieve markedly better privacy preservation but also match or exceed the performance of much larger zero-shot models. At the same time, the system still faces challenges in fully adhering to user instructions, underscoring the need for models with a better understanding of user-defined privacy preferences.
