Table of Contents
Fetching ...

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver, Mark Dras

Abstract

The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter $\varepsilon$ that upper bounds the worst-case privacy loss. However, nominal $\varepsilon$ values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal $\varepsilon$ bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Abstract

The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter that upper bounds the worst-case privacy loss. However, nominal values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.
Paper Structure (36 sections, 25 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 36 sections, 25 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Text Distinguishability Audit for empirical privacy assessment of text rewriting mechanisms. (1) A mechanism $\mathcal{M}$ produces privatized text at a given privacy budget $\varepsilon_{\text{theoretical}}$; (2) an adversary $\mathcal{A}$ attempts to identify the true source $v_i$ from a candidate set $S$; (3) A correct attribution indicates empirical distinguishability, while an incorrect attribution indicates indistinguishability.
  • Figure 2: Empirical calibration results under the LLM distinguishability attack across datasets. The x-axis shows nominal $\varepsilon$ and the y-axis shows estimated empirical privacy loss $\varepsilon_{\mathrm{emp}}$. The dashed horizontal line marks the finite-sample ceiling (${\approx}7.54$, $k=2$, $T=10^4$).
  • Figure 3: Empirical calibration results under the external (top) and internal (bottom) embedding distinguishability attacks across datasets. Axes and ceiling as in Figure \ref{['fig:llm-results']}.
  • Figure 4: Downstream utility and privacy attribute protection across LDP text rewriting methods, evaluated at $\varepsilon \in \{250, 1000, 2500\}$ with error bars over 3 random seeds. (a) SNIPS intent classification and (b) Trustpilot sentiment report utility F1 (higher is better). (c) Trustpilot gender attribute inference F1 measures resistance to private attribute inference (lower is better).
  • Figure 5: Empirical calibration results under $T=10^4$ and $10^6$ Monte Carlo trials with DP-MLM on ATIS, showing consistent trends across $\varepsilon$ values.
  • ...and 6 more figures