Table of Contents
Fetching ...

Tight and Practical Privacy Auditing for Differentially Private In-Context Learning

Yuyang Xia, Ruixuan Liu, Li Xiong

TL;DR

This paper tackles privacy leakage in differentially private in-context learning (DP-ICL) by introducing a tight, efficient auditing framework that treats leakage as a membership inference problem and maps attack success to empirical Gaussian DP guarantees. It supports both black-box and white-box threat models and unifies classification and generation auditing through a binary decision task, using carefully designed audit queries to maximize signal under DP noise. The authors provide theoretical analysis showing how to maximize empirical leakage via partitioning and demonstrate the framework on standard NLP tasks, finding that classification leakage closely matches theoretical GDP budgets while generation leakage is typically lower due to conservative embedding-sensitivity bounds. The work yields a practical tool for verifying and tightening DP-ICL deployments, aiding developers in detecting implementation flaws and calibrating privacy budgets in real-world systems.

Abstract

Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data. Although differential privacy (DP) with private voting is a pragmatic mitigation, DP-ICL implementations are error-prone, and worst-case DP bounds may substantially overestimate actual leakage, calling for practical auditing tools. We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP. Our analysis of the private voting mechanism identifies vote configurations that maximize the auditing signal, guiding the design of audit queries that reliably reveal whether a canary demonstration is present in the context. The framework supports both black-box (API-only) and white-box (internal vote) threat models, and unifies auditing for classification and generation by reducing both to a binary decision problem. Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds, making our framework a practical privacy auditor and verifier for real-world DP-ICL deployments.

Tight and Practical Privacy Auditing for Differentially Private In-Context Learning

TL;DR

This paper tackles privacy leakage in differentially private in-context learning (DP-ICL) by introducing a tight, efficient auditing framework that treats leakage as a membership inference problem and maps attack success to empirical Gaussian DP guarantees. It supports both black-box and white-box threat models and unifies classification and generation auditing through a binary decision task, using carefully designed audit queries to maximize signal under DP noise. The authors provide theoretical analysis showing how to maximize empirical leakage via partitioning and demonstrate the framework on standard NLP tasks, finding that classification leakage closely matches theoretical GDP budgets while generation leakage is typically lower due to conservative embedding-sensitivity bounds. The work yields a practical tool for verifying and tightening DP-ICL deployments, aiding developers in detecting implementation flaws and calibrating privacy budgets in real-world systems.

Abstract

Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data. Although differential privacy (DP) with private voting is a pragmatic mitigation, DP-ICL implementations are error-prone, and worst-case DP bounds may substantially overestimate actual leakage, calling for practical auditing tools. We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP. Our analysis of the private voting mechanism identifies vote configurations that maximize the auditing signal, guiding the design of audit queries that reliably reveal whether a canary demonstration is present in the context. The framework supports both black-box (API-only) and white-box (internal vote) threat models, and unifies auditing for classification and generation by reducing both to a binary decision problem. Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds, making our framework a practical privacy auditor and verifier for real-world DP-ICL deployments.

Paper Structure

This paper contains 20 sections, 2 theorems, 19 equations, 21 figures, 3 tables, 3 algorithms.

Key Result

Lemma 3.1

In the Gaussian-noise-based private voting mechanism, the Gaussian-DP parameter $\mu_{\text{Gauss}}$ is maximized independently of the number of partitions $T$ and the number of positive votes $k$, whereas the DP-based empirical loss $\epsilon_{\text{emp}}=\log(\mathrm{TPR}/\mathrm{FPR})$ attains it

Figures (21)

  • Figure 1: An example of ICL privacy leakage.
  • Figure 2: General Auditing Framework for DP-ICL.
  • Figure 3: A classification auditing example.
  • Figure 4: A generation auditing example.
  • Figure 5: Overview of the text classification auditing result.
  • ...and 16 more figures

Theorems & Definitions (6)

  • Example 3.1
  • Lemma 3.1
  • Example 3.2
  • Lemma 1
  • proof
  • proof