Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

Xinyu Tang; Richard Shin; Huseyin A. Inan; Andre Manoel; Fatemehsadat Mireshghallah; Zinan Lin; Sivakanth Gopi; Janardhan Kulkarni; Robert Sim

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, Fatemehsadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim

TL;DR

The paper tackles privacy risks in in-context learning by introducing a differential-privacy (DP) framework that privately generates synthetic few-shot demonstrations from a private dataset. It presents a PATE-like algorithm that aggregates generation signals from disjoint private subsets to produce DP-compliant prompts, enabling unlimited inference without additional privacy cost. Empirical results across AGNews, TREC, DBPedia, and MIT datasets show that 4-shot DP ICL can approach non-private performance at modest privacy budgets (e.g., $\epsilon=1$ on TREC yields 50.7% accuracy, near the non-private 50.6%), and even zero-shot generation by the model itself can yield strong baselines in some cases. The work demonstrates the practicality of privacy-preserving ICL for diverse NLP tasks and discusses future improvements in sampling and offline-online LM setups to further close the privacy-utility gap.

Abstract

We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

TL;DR

on TREC yields 50.7% accuracy, near the non-private 50.6%), and even zero-shot generation by the model itself can yield strong baselines in some cases. The work demonstrates the practicality of privacy-preserving ICL for diverse NLP tasks and discusses future improvements in sampling and offline-online LM setups to further close the privacy-utility gap.

Abstract

Paper Structure (27 sections, 5 theorems, 5 equations, 3 figures, 17 tables, 1 algorithm)

This paper contains 27 sections, 5 theorems, 5 equations, 3 figures, 17 tables, 1 algorithm.

Introduction
Our Contributions
Preliminaries
Prompting and In-context Learning
Differential Privacy
Problem Definition, Threat Model, and Notations
Proposed Method
Algorithm for DP Few-shot Generation
Privacy Analysis
Experiments
Implementation Settings
Main Results
Ablation Studies
Related Work
Conclusion and Future Work
...and 12 more sections

Key Result

Theorem 4.2

Alg. alg:main is $(\epsilon, \delta)$ differentially private.

Figures (3)

Figure 1: Description of a potential privacy violation when few-shot demonstrations are pulled from a private dataset in an ICL framework for a healthcare application. A malicious adversary attempts a basic prompt injection attack and gains direct access to the demonstrations. Basic heuristics such as personal identifiable information (PII) removal may still leave linkable information GDPR to an individual in case the adversary has auxilary information (e.g., a unique patient with a particular disease or treatment) and do not prevent against privacy violations.
Figure 2: Our proposed framework for privacy-preserving ICL. Given a private dataset, we first generate synthetic few-shot samples with DP. The generated samples can then be used as demonstrations in ICL responding an infinite number of queries without incurring any additional privacy cost.
Figure 3: Illustration of step 1 (DP few-shot generation) in our framework (Fig. \ref{['fig:framework']}). The example shows a synthetic demonstration generated token by token for the topic school with DP. The operations in Alg. \ref{['alg:main']} for one step of generation (the token College) are depicted step by step.

Theorems & Definitions (11)

Definition 2.1: Differential Privacy (DP) DworkKMMN06
Remark 4.1
Theorem 4.2
proof : Proof Overview
Definition A.1
Theorem A.2
Theorem A.3: gopi2021numerical
Theorem A.4
Theorem A.5
proof
...and 1 more

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

TL;DR

Abstract

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)