Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Weijun Li; Arnaud Grivet Sébert; Qiongkai Xu; Annabelle McIver; Mark Dras

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver, Mark Dras

Abstract

The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter $\varepsilon$ that upper bounds the worst-case privacy loss. However, nominal $\varepsilon$ values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal $\varepsilon$ bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Abstract

that upper bounds the worst-case privacy loss. However, nominal

values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal

bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.

Paper Structure (36 sections, 25 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 36 sections, 25 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Methodology
Distinguishability-Based Calibration
Candidate Selection for Calibration
Why Candidate Selection Matters in Text LDP
Candidate Selection Strategies
Efficient Estimation
Experimental Setup and Results
Experimental Setup
Datasets.
Results
Ablation Studies
Conclusion
Proof of the Reduced-Form Privacy Loss Estimator
...and 21 more sections

Figures (11)

Figure 1: Text Distinguishability Audit for empirical privacy assessment of text rewriting mechanisms. (1) A mechanism $\mathcal{M}$ produces privatized text at a given privacy budget $\varepsilon_{\text{theoretical}}$; (2) an adversary $\mathcal{A}$ attempts to identify the true source $v_i$ from a candidate set $S$; (3) A correct attribution indicates empirical distinguishability, while an incorrect attribution indicates indistinguishability.
Figure 2: Empirical calibration results under the LLM distinguishability attack across datasets. The x-axis shows nominal $\varepsilon$ and the y-axis shows estimated empirical privacy loss $\varepsilon_{\mathrm{emp}}$. The dashed horizontal line marks the finite-sample ceiling (${\approx}7.54$, $k=2$, $T=10^4$).
Figure 3: Empirical calibration results under the external (top) and internal (bottom) embedding distinguishability attacks across datasets. Axes and ceiling as in Figure \ref{['fig:llm-results']}.
Figure 4: Downstream utility and privacy attribute protection across LDP text rewriting methods, evaluated at $\varepsilon \in \{250, 1000, 2500\}$ with error bars over 3 random seeds. (a) SNIPS intent classification and (b) Trustpilot sentiment report utility F1 (higher is better). (c) Trustpilot gender attribute inference F1 measures resistance to private attribute inference (lower is better).
Figure 5: Empirical calibration results under $T=10^4$ and $10^6$ Monte Carlo trials with DP-MLM on ATIS, showing consistent trends across $\varepsilon$ values.
...and 6 more figures

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Abstract

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Authors

Abstract

Table of Contents

Figures (11)