Table of Contents
Fetching ...

DP-Fusion: Token-Level Differentially Private Inference for Large Language Models

Rushil Thareja, Preslav Nakov, Praneeth Vepakomma, Nils Lukas

TL;DR

DP-Fusion tackles the privacy risk of LLM inferences leaking sensitive contextual data by introducing token-level differential privacy applied during inference. It partitions input into private groups and a public portion, computes per-group private distributions, and mollifies outputs to bound the divergence from the sanitized baseline, achieving $(\alpha,\beta)$-Rényi DP guarantees. The method demonstrates superior utility-to-privacy trade-offs compared with existing DPI approaches across document privatization tasks and provides empirical defense against jailbreaks and prompt-injection in RAG settings. This work enables safer deployment of LLMs with sensitive data by offering formal privacy guarantees, practical utility, and resilience against inference-time attacks.

Abstract

Large language models (LLMs) do not preserve privacy at inference-time. The LLM's outputs can inadvertently reveal information about the model's context, which presents a privacy challenge when the LLM is augmented via tools or databases containing sensitive information. Existing privacy-preserving methods at inference-time have significant limitations since they (i) lack provable guarantees or (ii) have a poor utility/privacy trade-off. We propose DP-Fusion, a Differentially Private Inference (DPI) mechanism for LLMs that provably bounds the influence a set of tokens in the context can have on the LLM's output. DP-Fusion works as follows: (1) label a subset of sensitive tokens, (2) infer the LLM without any sensitive tokens to obtain a baseline, (3) infer the LLM with the sensitive tokens, and (4) blend distributions so that the final output remains within a bounded distance of the baseline distribution. While this per-token influence bound also mitigates jailbreak-style prompt injection, we focus on \emph{document privatization}, where the goal is to paraphrase a document containing sensitive tokens, e.g., personally identifiable information, so that no attacker can reliably infer them from the paraphrased document while preserving high text quality. The privacy/utility trade-off is controlled by $ε$, where $ε=0$ hides sensitive tokens entirely, while higher values trade off privacy for improved text quality. We show that our method creates token-level provably privatized documents with substantially improved theoretical and empirical privacy, achieving $6\times$ lower perplexity than related DPI methods.

DP-Fusion: Token-Level Differentially Private Inference for Large Language Models

TL;DR

DP-Fusion tackles the privacy risk of LLM inferences leaking sensitive contextual data by introducing token-level differential privacy applied during inference. It partitions input into private groups and a public portion, computes per-group private distributions, and mollifies outputs to bound the divergence from the sanitized baseline, achieving -Rényi DP guarantees. The method demonstrates superior utility-to-privacy trade-offs compared with existing DPI approaches across document privatization tasks and provides empirical defense against jailbreaks and prompt-injection in RAG settings. This work enables safer deployment of LLMs with sensitive data by offering formal privacy guarantees, practical utility, and resilience against inference-time attacks.

Abstract

Large language models (LLMs) do not preserve privacy at inference-time. The LLM's outputs can inadvertently reveal information about the model's context, which presents a privacy challenge when the LLM is augmented via tools or databases containing sensitive information. Existing privacy-preserving methods at inference-time have significant limitations since they (i) lack provable guarantees or (ii) have a poor utility/privacy trade-off. We propose DP-Fusion, a Differentially Private Inference (DPI) mechanism for LLMs that provably bounds the influence a set of tokens in the context can have on the LLM's output. DP-Fusion works as follows: (1) label a subset of sensitive tokens, (2) infer the LLM without any sensitive tokens to obtain a baseline, (3) infer the LLM with the sensitive tokens, and (4) blend distributions so that the final output remains within a bounded distance of the baseline distribution. While this per-token influence bound also mitigates jailbreak-style prompt injection, we focus on \emph{document privatization}, where the goal is to paraphrase a document containing sensitive tokens, e.g., personally identifiable information, so that no attacker can reliably infer them from the paraphrased document while preserving high text quality. The privacy/utility trade-off is controlled by , where hides sensitive tokens entirely, while higher values trade off privacy for improved text quality. We show that our method creates token-level provably privatized documents with substantially improved theoretical and empirical privacy, achieving lower perplexity than related DPI methods.

Paper Structure

This paper contains 42 sections, 5 theorems, 7 equations, 15 figures, 13 tables, 2 algorithms.

Key Result

Theorem 1

For any order $\alpha>1$, a randomized algorithm $\mathsf{M}$ is said to satisfy $(\alpha,\epsilon)$-RDP if, for every pair of adjacent datasets $D\sim D'$, where $P$ and $Q$ are probability distributions on the same sample space and $x$ is drawn from $Q$.

Figures (15)

  • Figure 1: An overview of DP-Fusion for differentially private LLM inference whose output is revealed to a potentially untrusted user. Sensitive tokens are redacted PII.
  • Figure 2: Our DPI method DP-Fusion for document privatization: (1) The user specifies per-group privacy parameters and submits a private document. (2) Private token groups are marked using the local tagger, and (3a) a public document version is created without any private tokens and (3b) multiple group-wise private versions are also created that only reveal one privacy group at a time. (4) During inference, tokens are sampled from a mixture of public and private next-token distributions. (5) The paraphrased document.
  • Figure 3: Win-Rate (row beats column) of the generated paraphrases, GPT-4o-mini judge.
  • Figure 4: Average perplexity versus the agerage theoretical privacy parameter $\varepsilon$ (via max divergence bound $\alpha \beta_i$) for our method, DP-Fusion.
  • Figure 5: Perplexity vs $\varepsilon$ for DP-Prompt and DP-Decoding across their respective parameter settings.
  • ...and 10 more figures

Theorems & Definitions (11)

  • Definition 1: Approximate Differential Privacy dwork2014algorithmic
  • Definition 1: Approximate Differential Privacy dwork2014algorithmic
  • Theorem 1: Rényi Differential Privacy (RDP) renyi
  • Theorem 2: RDP $\Rightarrow$ DP conversion renyi
  • Definition 2: Differentially Private Inference (DPI)
  • Theorem 3: Monotonicity of the Rényi divergence
  • Definition 3: DP neighborhood
  • Definition 4: Per‑group $(\alpha,\beta_i)$-Rényi DP
  • Theorem 4: Per‑group $(\varepsilon_i,\delta)$-DP for $T$ tokens
  • Theorem 5: Monotonicity of the Rényi divergence
  • ...and 1 more