Table of Contents
Fetching ...

InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy

Vishnu Vinod, Krishna Pillutla, Abhradeep Guha Thakurta

TL;DR

InvisibleInk presents a scalable framework for differential privacy in long-form text generation by recasting next-token sampling as an exponential mechanism. The key innovations—DClip, which isolates and clips only private-logit differences from public priors, and Top-$k+$ sampling, which uses a tight supersets of private-top tokens—achieve strong privacy guarantees with dramatically reduced compute relative to prior methods. Empirical results across medical, legal, and commercial domains show 8–16x compute savings for comparable privacy-utility levels, with robust open-source tooling (invink) for practical adoption. The work advances private inference for open-ended generation, enabling realistic deployment in privacy-sensitive settings while outlining practical limitations and guidance for practitioners.

Abstract

As major progress in LLM-based long-form text generation enables paradigms such as retrieval-augmented generation (RAG) and inference-time scaling, safely incorporating private information into the generation remains a critical open question. We present InvisibleInk, a highly scalable long-form text generation framework satisfying rigorous differential privacy guarantees with respect to the sensitive reference texts. It interprets sampling from the LLM's next-token-distribution as the exponential mechanism over the LLM logits with two innovations. First, we reduce the privacy cost by isolating and clipping only the sensitive information in the model logits (relative to the public logits). Second, we improve text quality by sampling without any privacy cost from a small superset of the top-$k$ private tokens. Empirical evaluations demonstrate a consistent $8\times$ (or more) reduction in computation cost over state-of-the-art baselines to generate long-form private text of the same utility across privacy levels. InvisibleInk is able to generate, for the first time, high-quality private long-form text at less than $4$-$8\times$ times the computation cost of non-private generation, paving the way for its practical use. We open-source a pip-installable Python package (invink) for InvisibleInk at https://github.com/cerai-iitm/invisibleink.

InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy

TL;DR

InvisibleInk presents a scalable framework for differential privacy in long-form text generation by recasting next-token sampling as an exponential mechanism. The key innovations—DClip, which isolates and clips only private-logit differences from public priors, and Top- sampling, which uses a tight supersets of private-top tokens—achieve strong privacy guarantees with dramatically reduced compute relative to prior methods. Empirical results across medical, legal, and commercial domains show 8–16x compute savings for comparable privacy-utility levels, with robust open-source tooling (invink) for practical adoption. The work advances private inference for open-ended generation, enabling realistic deployment in privacy-sensitive settings while outlining practical limitations and guidance for practitioners.

Abstract

As major progress in LLM-based long-form text generation enables paradigms such as retrieval-augmented generation (RAG) and inference-time scaling, safely incorporating private information into the generation remains a critical open question. We present InvisibleInk, a highly scalable long-form text generation framework satisfying rigorous differential privacy guarantees with respect to the sensitive reference texts. It interprets sampling from the LLM's next-token-distribution as the exponential mechanism over the LLM logits with two innovations. First, we reduce the privacy cost by isolating and clipping only the sensitive information in the model logits (relative to the public logits). Second, we improve text quality by sampling without any privacy cost from a small superset of the top- private tokens. Empirical evaluations demonstrate a consistent (or more) reduction in computation cost over state-of-the-art baselines to generate long-form private text of the same utility across privacy levels. InvisibleInk is able to generate, for the first time, high-quality private long-form text at less than - times the computation cost of non-private generation, paving the way for its practical use. We open-source a pip-installable Python package (invink) for InvisibleInk at https://github.com/cerai-iitm/invisibleink.

Paper Structure

This paper contains 38 sections, 3 theorems, 30 equations, 11 figures, 15 tables, 1 algorithm.

Key Result

Theorem 2

alg:main with a maximum token budget $T$, a clipping threshold $C$, a set ${\bm{R}}$ of $B = |{\bm{R}}|$ references, and temperature $\tau$ satisfies $\rho_{\mathsf{seq}}$-zCDP with $\rho_{\mathsf{seq}} = {T C^2} / {(2 B^2 \tau^2)}$.

Figures (11)

  • Figure 1: InvisibleInk interprets differentially private text generation as an iterative application of the exponential mechanism over a subset of the LLM's clipped logits. Our key innovations are: (a) DClip, an improved clipping function to reduce the sensitivity, and hence, the privacy cost; and (b) Top-$k+$ sampling, a truncated decoding algorithm to improve utility by selecting a subset of logits to sample each token from.
  • Figure 2: Left & Center: Illustration of how two common decoding algorithms---temperature rescaling and top-$k$ sampling---reshape the next-token probabilities for (non-private) LLM-based text generation. Right: Heatmap of MAUVE scores pillutla2021mauvenipspillutla2023mauvejmlr of synthetic text generated for the MIMIC-IV-Notes dataset (without using any sensitive references). The best generations (highest MAUVE scores) are obtained at $\tau \approx 1.1$ and $k\approx 100$; InvisibleInk exhibits similar behavior of decoding hyperparameters for private text generation.
  • Figure 3: Left two: Histograms of private logits ${\bm{\phi}}_i$ and differences from public logits ${\bm{\phi}}_i-{\bm{\phi}}_\mathsf{pub}$ for a synthetic data sample generated from the MIMIC dataset, with $5$th and $95$th percentiles shown by the dotted lines. The spread of values for ${\bm{\phi}}_i-{\bm{\phi}}_\mathsf{pub}$ is significantly smaller (around $10\times$) than that of ${\bm{\phi}}_i$. Thus, $\mathsf{DClip}\xspace_C({\bm{\phi}}_i, {\bm{\phi}}_\mathsf{pub})(y) = {\bm{\phi}}_i(y)$ for over $95\%$ of all $y \in V$ with $C \approx 1$, while the naive clipping of amin2024private requires $C \approx8$. This translates into an $8\times$ gain in computational efficiency. Right two: A small clip norm of $C=1$ (using amin2024private's method introduces significant bias resulting in a near-uniform distribution over the vocabulary. In contrast, DClip (see §\ref{['sec:method']}) preserves probabilities with minimal distortion even at $C = 1$.
  • Figure 4: Utility-compute tradeoffs at $(\varepsilon=10,\delta=10^{-6})$ DP on each dataset across varying compute budget from $B \in \{1, 3, \ldots, 127\}$. Results reported over $3$ runs with $95\%$ confidence intervals (see §\ref{['sec:expt:confidence-intervals']}) for MIMIC and $1$ run for Yelp/TAB datasets. InvisibleInk can produce text that matches or exceeds the baselines at a fraction of the compute. The baselines do not even work for the low-resource TAB dataset at a small batch size $B=7$.
  • Figure 5: InvisibleInk outperforms the API-access method of AugPE xie2024dpsyndata across all settings: Utility vs Compute plots (avg. over $3$ runs with $95\%$ confidence intervals) for InvisibleInk and AugPE for $\varepsilon=10$ for 1000 synthetic texts generated for the MIMIC dataset. Wall-clock run time is used as a proxy for computational cost. We report results for $B+1=4,8,16,32$ for InvisibleInk and and $T_{\mathsf{AugPE}}=1,3,5,10$ for AugPE.
  • ...and 6 more figures

Theorems & Definitions (15)

  • Theorem 2
  • Definition 3: Zero-concentrated DP
  • Definition 4: Add-or-Remove Adjacency
  • Definition 5: Replace-by-Null Adjacency
  • Definition 6: Zero-Out Adjacency
  • Remark 7
  • Definition 8: Sensitivity
  • proof
  • proof
  • proof : Proof of \ref{['prop:ourclip-sensitivity']}
  • ...and 5 more