PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles
Li Siyan, Vethavikashini Chithrra Raghuram, Omar Khattab, Julia Hirschberg, Zhou Yu
TL;DR
The paper tackles inference-time privacy risks when using API-based LLMs by proposing Privacy-Conscious Delegation, which leverages a small, local model as a privacy-preserving proxy to query a stronger remote model. It introduces PAPILLON, a multi-stage, prompt-optimized pipeline, and the PUPA benchmark to study real-world PII leakage in user prompts. Experiments show that the optimized PAPILLON setup can achieve high final-output quality while substantially reducing privacy leakage, though a gap remains relative to exclusive use of proprietary models. The work provides a practical framework for privacy-aware deployments, highlights the trade-offs between local model capability and remote-model access, and outlines avenues for future improvements including privacy guarantees and specialized local models.
Abstract
Users can divulge sensitive information to proprietary LLM providers, raising significant privacy concerns. While open-source models, hosted locally on the user's machine, alleviate some concerns, models that users can host locally are often less capable than proprietary frontier models. Toward preserving user privacy while retaining the best quality, we propose Privacy-Conscious Delegation, a novel task for chaining API-based and local models. We utilize recent public collections of user-LLM interactions to construct a natural benchmark called PUPA, which contains personally identifiable information (PII). To study potential approaches, we devise PAPILLON, a multi-stage LLM pipeline that uses prompt optimization to address a simpler version of our task. Our best pipeline maintains high response quality for 85.5% of user queries while restricting privacy leakage to only 7.5%. We still leave a large margin to the generation quality of proprietary LLMs for future work. Our data and code is available at https://github.com/siyan-sylvia-li/PAPILLON.
