Table of Contents
Fetching ...

On the Privacy Risk of In-context Learning

Haonan Duan, Adam Dziedzic, Mohammad Yaghini, Nicolas Papernot, Franziska Boenisch

TL;DR

This work shows that deploying prompted models presents a significant privacy risk for the data used within the prompt by instantiating a highly effective membership inference attack, and proposes ensembling as a mitigation strategy by aggregating over multiple different versions of a prompted model.

Abstract

Large language models (LLMs) are excellent few-shot learners. They can perform a wide variety of tasks purely based on natural language prompts provided to them. These prompts contain data of a specific downstream task -- often the private dataset of a party, e.g., a company that wants to leverage the LLM for their purposes. We show that deploying prompted models presents a significant privacy risk for the data used within the prompt by instantiating a highly effective membership inference attack. We also observe that the privacy risk of prompted models exceeds fine-tuned models at the same utility levels. After identifying the model's sensitivity to their prompts -- in the form of a significantly higher prediction confidence on the prompted data -- as a cause for the increased risk, we propose ensembling as a mitigation strategy. By aggregating over multiple different versions of a prompted model, membership inference risk can be decreased.

On the Privacy Risk of In-context Learning

TL;DR

This work shows that deploying prompted models presents a significant privacy risk for the data used within the prompt by instantiating a highly effective membership inference attack, and proposes ensembling as a mitigation strategy by aggregating over multiple different versions of a prompted model.

Abstract

Large language models (LLMs) are excellent few-shot learners. They can perform a wide variety of tasks purely based on natural language prompts provided to them. These prompts contain data of a specific downstream task -- often the private dataset of a party, e.g., a company that wants to leverage the LLM for their purposes. We show that deploying prompted models presents a significant privacy risk for the data used within the prompt by instantiating a highly effective membership inference attack. We also observe that the privacy risk of prompted models exceeds fine-tuned models at the same utility levels. After identifying the model's sensitivity to their prompts -- in the form of a significantly higher prediction confidence on the prompted data -- as a cause for the increased risk, we propose ensembling as a mitigation strategy. By aggregating over multiple different versions of a prompted model, membership inference risk can be decreased.

Paper Structure

This paper contains 26 sections, 2 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Setup for Prompting and MIA. We prompt the LLM with different prompts (same template) for a downstream task. The LLM returns per-token probabilities for the next token in the sequence. The adversary has query access to the prompted LLM and obtains prediction probabilities for each possible target class of the downstream task.
  • Figure 2: Ensemble of Prompted Models. We ensemble multiple prompted models with disjoint data and the same template. The final prediction is an aggregate of outputs from each prompted model.
  • Figure 3: Prediction Probability at Target Class (sst2). We plot output prediction probability for the target class for member and non-member data points of the prompt in the prompted LLM. We find that the LLMs outputs for the prompt's member data is significantly higher than for non-member data points.
  • Figure 4: MIA risk over all Datasets. We depict the AUC-ROC curves over all datasets. The red dashed line represents the MIA success of random guessing. Each gray line corresponds to a prompted model with its four member data points. Due to the small number of member data points (4), our resulting TPRs can only be 0% 25%, 50%, or 100% which leads to the step-shape of the gray curves. The reported average AUC-score is calculated as an average over the individual prompted models (gray lines)' AUC score. Additionally, for visualization purposes, we average the gray lines over all prompted models and depict the average as the blue line. We use 50 prompted models in this experiment.
  • Figure 5: Impact of Model Size on Membership Risk. We report the TPR at FPR $1e-3$ for GPT2-base and GPT2-xl (117M vs 1.5B parameters). For fair comparison, we tune 1000 prompts for both architectures, keep the best 50 for GPT2-base, and for GPT2-xl, we keep the 50 prompts that yield validation accuracy closest to the one of GPT2-base. We observe that larger models leak less private information about their prompts. All results are obtained on the sst2 dataset.
  • ...and 13 more figures