Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Haonan Duan, Adam Dziedzic, Nicolas Papernot, Franziska Boenisch
TL;DR
The paper demonstrates that prompts for LLMs leak private information through membership inference attacks and that prompting can be made privately without full model fine-tuning. It introduces two methods: PromptDPSGD, which privately learns soft prompts via DP-SGD while keeping the LLM fixed, and PromptPATE, which privately learns discrete prompts using a flock of stochastic parrots and a noisy knowledge transfer (PATE) under black-box access. Empirical results show that these private prompts can achieve utility close to non-private prompting across multiple datasets and models (including GPT-3 and Claude) with strong privacy guarantees (e.g., $\varepsilon$ around $0.1$–$0.3$ for $\delta=10^{-6}$). The work provides a practical, scalable path to private prompt learning that preserves the efficiency and flexibility of prompting, with significant implications for deploying LLMs on sensitive downstream data.
Abstract
Large language models (LLMs) are excellent in-context learners. However, the sensitivity of data contained in prompts raises privacy concerns. Our work first shows that these concerns are valid: we instantiate a simple but highly effective membership inference attack against the data used to prompt LLMs. To address this vulnerability, one could forego prompting and resort to fine-tuning LLMs with known algorithms for private gradient descent. However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. We first show that soft prompts can be obtained privately through gradient descent on downstream data. However, this is not the case for discrete prompts. Thus, we orchestrate a noisy vote among an ensemble of LLMs presented with different prompts, i.e., a flock of stochastic parrots. The vote privately transfers the flock's knowledge into a single public prompt. We show that LLMs prompted with our private algorithms closely match the non-private baselines. For example, using GPT3 as the base model, we achieve a downstream accuracy of 92.7% on the sst2 dataset with ($ε=0.147, δ=10^{-6}$)-differential privacy vs. 95.2% for the non-private baseline. Through our experiments, we also show that our prompt-based approach is easily deployed with existing commercial APIs.
