Table of Contents
Fetching ...

Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models

Haonan Duan, Adam Dziedzic, Nicolas Papernot, Franziska Boenisch

TL;DR

The paper demonstrates that prompts for LLMs leak private information through membership inference attacks and that prompting can be made privately without full model fine-tuning. It introduces two methods: PromptDPSGD, which privately learns soft prompts via DP-SGD while keeping the LLM fixed, and PromptPATE, which privately learns discrete prompts using a flock of stochastic parrots and a noisy knowledge transfer (PATE) under black-box access. Empirical results show that these private prompts can achieve utility close to non-private prompting across multiple datasets and models (including GPT-3 and Claude) with strong privacy guarantees (e.g., $\varepsilon$ around $0.1$–$0.3$ for $\delta=10^{-6}$). The work provides a practical, scalable path to private prompt learning that preserves the efficiency and flexibility of prompting, with significant implications for deploying LLMs on sensitive downstream data.

Abstract

Large language models (LLMs) are excellent in-context learners. However, the sensitivity of data contained in prompts raises privacy concerns. Our work first shows that these concerns are valid: we instantiate a simple but highly effective membership inference attack against the data used to prompt LLMs. To address this vulnerability, one could forego prompting and resort to fine-tuning LLMs with known algorithms for private gradient descent. However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. We first show that soft prompts can be obtained privately through gradient descent on downstream data. However, this is not the case for discrete prompts. Thus, we orchestrate a noisy vote among an ensemble of LLMs presented with different prompts, i.e., a flock of stochastic parrots. The vote privately transfers the flock's knowledge into a single public prompt. We show that LLMs prompted with our private algorithms closely match the non-private baselines. For example, using GPT3 as the base model, we achieve a downstream accuracy of 92.7% on the sst2 dataset with ($ε=0.147, δ=10^{-6}$)-differential privacy vs. 95.2% for the non-private baseline. Through our experiments, we also show that our prompt-based approach is easily deployed with existing commercial APIs.

Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models

TL;DR

The paper demonstrates that prompts for LLMs leak private information through membership inference attacks and that prompting can be made privately without full model fine-tuning. It introduces two methods: PromptDPSGD, which privately learns soft prompts via DP-SGD while keeping the LLM fixed, and PromptPATE, which privately learns discrete prompts using a flock of stochastic parrots and a noisy knowledge transfer (PATE) under black-box access. Empirical results show that these private prompts can achieve utility close to non-private prompting across multiple datasets and models (including GPT-3 and Claude) with strong privacy guarantees (e.g., around for ). The work provides a practical, scalable path to private prompt learning that preserves the efficiency and flexibility of prompting, with significant implications for deploying LLMs on sensitive downstream data.

Abstract

Large language models (LLMs) are excellent in-context learners. However, the sensitivity of data contained in prompts raises privacy concerns. Our work first shows that these concerns are valid: we instantiate a simple but highly effective membership inference attack against the data used to prompt LLMs. To address this vulnerability, one could forego prompting and resort to fine-tuning LLMs with known algorithms for private gradient descent. However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. We first show that soft prompts can be obtained privately through gradient descent on downstream data. However, this is not the case for discrete prompts. Thus, we orchestrate a noisy vote among an ensemble of LLMs presented with different prompts, i.e., a flock of stochastic parrots. The vote privately transfers the flock's knowledge into a single public prompt. We show that LLMs prompted with our private algorithms closely match the non-private baselines. For example, using GPT3 as the base model, we achieve a downstream accuracy of 92.7% on the sst2 dataset with ()-differential privacy vs. 95.2% for the non-private baseline. Through our experiments, we also show that our prompt-based approach is easily deployed with existing commercial APIs.
Paper Structure (48 sections, 1 theorem, 5 figures, 7 tables, 2 algorithms)

This paper contains 48 sections, 1 theorem, 5 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

Let $T$ be the total number of repetitions (training iterations) of our PromptDPSGD and the sampling rate be denoted by $q$. Then, there exist two constants $c_1$ and $c_2$, such that for any $\varepsilon < c_1 q^2 T$ our PromptDPSGD guarantees $(\varepsilon, \delta)$-DP, if for any $\delta >0$, we

Figures (5)

  • Figure 1: Our methods for private prompt learning.Left:PromptDPSGD obtains the input gradients from the LLM, and performs DPSGD to update the soft prompt embedding while keeping the LLM frozen. Right:PromptPATE creates a noisy ensemble of private discrete prompts, and then transfers knowledge by selecting a student prompt that can be publicly released. PromptPATE only needs black-box access of the LLM and, thus, can be easily deployed with commercial APIs.
  • Figure 2: MIA Risk. We study GPT3 prompted with $100$ different one-shot examples (dbpedia). left: We present the prediction probabilities at the correct class for members (the one-shot example) and non-members ($50$ randomly sampled private points). The output probability for members is significantly higher than for non-member data points. right: We present the AUC-ROC curves of our MIA against the $100$ prompts (gray lines) and the blue line as an average over all attacks. Given that each prompt has only one member, the resulting TPRs can only be 0% or 100% which leads to the step-shape of the gray curves. The result indicates that our attack is significantly more successful than random guessing (the red dashed line).
  • Figure 3: Additional Insights of PromptPATE. We perform ablation studies on GPT3-Babbage and use dbpedia as private and agnews as public data. Left: Teacher consensus as the fraction of teachers who vote for the correct class over 500 public input sequences. PromptPATE achieves overall high consensus. Right: Student accuracy as a function of the public query set's size. Already with as few as 100 queries, we observe a plateau in accuracy which highlights PromptPATE's data efficiency.
  • Figure 4: MIA Risk over Multiple Datasets on GPT3. We study GPT3-babbage prompted with $100$ different one-shot examples on four datasets. top: We present the prediction probabilities at the correct class for members (the one-shot example) and non-members ($50$ randomly sampled private points). The output probability for members is significantly higher than for non-member data points. bottom: We present the AUC-ROC curves of our MIA against the $100$ prompts (gray lines) and the blue line as an average over all attacks. Given that each prompt has only one member, the resulting TPRs can only be 0% or 100% which leads to the step-shape of the gray curves. The result indicates that our attack is significantly more successful than random guessing (the red dashed line).
  • Figure 5: MIA Risk over Multiple Datasets on GPT2-xl (4 shot). We study GPT2-xl prompted with $100$ different four-shot examples on four datasets. top: We present the prediction probabilities at the correct class for members (the one-shot example) and non-members ($50$ randomly sampled private points). The output probability for members is significantly higher than for non-member data points. bottom: We present the AUC-ROC curves of our MIA against the $100$ prompts (gray lines) and the blue line as an average over all attacks. Given that each prompt has only one member, the resulting TPRs can only be 0%, 25%, 50%, 75% or 100% which leads to the step-shape of the gray curves. The result indicates that our attack is significantly more successful than random guessing (the red dashed line).

Theorems & Definitions (2)

  • Theorem 1: Privacy of PromptDPSGD
  • proof