Table of Contents
Fetching ...

Privacy Preserving In-Context-Learning Framework for Large Language Models

Bishnu Bhusal, Manoj Acharya, Ramneet Kaur, Colin Samplawski, Anirban Roy, Adam D. Cobb, Rohit Chadha, Susmit Jha

TL;DR

This work tackles privacy risks in prompt based large language model usage by introducing a differential privacy oriented private prediction framework that generates synthetic text without fine tuning. It privately aggregates per token logits from disjoint private subsets, blends them with public prompts, and employs a DP mechanism to maintain privacy across token generation. Empirical results demonstrate improved in context learning performance across five tasks and strong resistance to privacy attacks, even at modest privacy budgets. The approach provides a scalable post processing DP solution that preserves utility in few shot ICL for sensitive domains.

Abstract

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information leakage, where adversaries can extract sensitive information embedded in the prompts. In this work, we introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees. Our approach leverages the Differential Privacy (DP) framework to ensure worst-case theoretical bounds on information leakage without requiring any fine-tuning of the underlying models. The proposed method performs inference on private records and aggregates the resulting per-token output distributions. This enables the generation of longer and coherent synthetic text while maintaining privacy guarantees. Additionally, we propose a simple blending operation that combines private and public inference to further enhance utility. Empirical evaluations demonstrate that our approach outperforms previous state-of-the-art methods on in-context-learning (ICL) tasks, making it a promising direction for privacy-preserving text generation while maintaining high utility. Our code is available at https://github.com/bhusalb/privacy-preserving-icl.

Privacy Preserving In-Context-Learning Framework for Large Language Models

TL;DR

This work tackles privacy risks in prompt based large language model usage by introducing a differential privacy oriented private prediction framework that generates synthetic text without fine tuning. It privately aggregates per token logits from disjoint private subsets, blends them with public prompts, and employs a DP mechanism to maintain privacy across token generation. Empirical results demonstrate improved in context learning performance across five tasks and strong resistance to privacy attacks, even at modest privacy budgets. The approach provides a scalable post processing DP solution that preserves utility in few shot ICL for sensitive domains.

Abstract

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information leakage, where adversaries can extract sensitive information embedded in the prompts. In this work, we introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees. Our approach leverages the Differential Privacy (DP) framework to ensure worst-case theoretical bounds on information leakage without requiring any fine-tuning of the underlying models. The proposed method performs inference on private records and aggregates the resulting per-token output distributions. This enables the generation of longer and coherent synthetic text while maintaining privacy guarantees. Additionally, we propose a simple blending operation that combines private and public inference to further enhance utility. Empirical evaluations demonstrate that our approach outperforms previous state-of-the-art methods on in-context-learning (ICL) tasks, making it a promising direction for privacy-preserving text generation while maintaining high utility. Our code is available at https://github.com/bhusalb/privacy-preserving-icl.

Paper Structure

This paper contains 27 sections, 4 theorems, 9 equations, 14 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

Let $\mathcal{R}$ be a set of possible outputs and let $q: \mathcal{D} \times \mathcal{R} \to \mathbb{R}$ be a utility function such that for any two adjacent databases $D$ and $D'$ (i.e., differing in one record), the sensitivity of $q$ satisfies: The Exponential Mechanism $\mathcal{M}_E$ selects an output $r \in \mathcal{R}$ with probability proportional to: Where, $\tau = \frac{2\Delta}{\epsi

Figures (14)

  • Figure 1: Overview of the proposed privacy-preserving synthetic text generation framework. A set of demonstrations is first sampled from the private dataset to construct prompts for next-token generation. These prompts are passed to an LLM to produce token-wise logits $(z_1, z_2, ... z_s)$, while a parallel public prompt yields a public logit vector $z_\text{pub}$. All logits are clipped to bound sensitivity. Then, only private logits are aggregated to compute $\bar{z} = \text{clip\_aggregate}(z_1, z_2, ... z_s)$. This aggregated private logit is then blended with clipped public logit $u$ and a token $x_t$ is sampled from the resulting temperature-scaled softmax distribution. The sampled token is appended to the synthetic sequence X, and this process is repeated until the end-of-sequence $\texttt{<eos>}$ token is emitted.
  • Figure 2: Example of our $k$-shot in-context-learning evaluation setup.
  • Figure 3: A synthesis sample of Dbpedia for Building category.
  • Figure 4: A synthesis sample of Agnews for Technology category.
  • Figure 5: A synthesis sample of Trec for Description question type.
  • ...and 9 more figures

Theorems & Definitions (6)

  • Definition 1: Differential Privacy (DP) DworkKMMN06
  • Lemma 1: Exponential Mechanism mcsherry2007mechanism
  • Theorem 1: Privacy of Algorithm \ref{['alg:main']}
  • Lemma 2
  • proof
  • Lemma 3: Advanced Composition for Exponential Mechanism dwork2014algorithmic