A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

Stephen Meisenbacher; Nihildev Nandakumar; Alexandra Klymenko; Florian Matthes

A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

Stephen Meisenbacher, Nihildev Nandakumar, Alexandra Klymenko, Florian Matthes

TL;DR

This paper benchmarks seven word-level differential privacy (DP) mechanisms applied to static word embeddings in NLP, assessing the privacy-utility trade-off across two tasks (IMDb sentiment and AG News topics) and three embedding dimensions under multiple privacy budgets $\varepsilon$. It introduces a comprehensive experimental framework with a diverse set of utility and privacy metrics, including a novel Privacy-Utility Composite (PUC) score to jointly quantify performance and privacy. The study reveals nuanced interactions between privacy guarantees and utility, showing that some mechanisms maintain or even improve utility under DP while others degrade performance, and it highlights the need for standardized evaluation metrics and more coherent semantic preservation in perturbed text. The authors provide open-source replication code to facilitate future benchmarking, and discuss implications for advancing word-level MLDP research and metric development in NLP.

Abstract

The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" representations. To this end, several implementations have appeared in the literature, each presenting an alternative method of achieving word-level Differential Privacy. Although each of these includes its own evaluation, no comparative analysis has been performed to investigate the performance of such methods relative to each other. In this work, we conduct such an analysis, comparing seven different algorithms on two NLP tasks with varying hyperparameters, including the $\textit{epsilon ($\varepsilon$)}$ parameter, or privacy budget. In addition, we provide an in-depth analysis of the results with a focus on the privacy-utility trade-off, as well as open-source our implementation code for further reproduction. As a result of our analysis, we give insight into the benefits and challenges of word-level Differential Privacy, and accordingly, we suggest concrete steps forward for the research field.

A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

TL;DR

. It introduces a comprehensive experimental framework with a diverse set of utility and privacy metrics, including a novel Privacy-Utility Composite (PUC) score to jointly quantify performance and privacy. The study reveals nuanced interactions between privacy guarantees and utility, showing that some mechanisms maintain or even improve utility under DP while others degrade performance, and it highlights the need for standardized evaluation metrics and more coherent semantic preservation in perturbed text. The authors provide open-source replication code to facilitate future benchmarking, and discuss implications for advancing word-level MLDP research and metric development in NLP.

Abstract

, where calibrated noise is added to word embedding vectors to achieve "noisy" representations. To this end, several implementations have appeared in the literature, each presenting an alternative method of achieving word-level Differential Privacy. Although each of these includes its own evaluation, no comparative analysis has been performed to investigate the performance of such methods relative to each other. In this work, we conduct such an analysis, comparing seven different algorithms on two NLP tasks with varying hyperparameters, including the

\varepsilon

parameter, or privacy budget. In addition, we provide an in-depth analysis of the results with a focus on the privacy-utility trade-off, as well as open-source our implementation code for further reproduction. As a result of our analysis, we give insight into the benefits and challenges of word-level Differential Privacy, and accordingly, we suggest concrete steps forward for the research field.

Paper Structure (43 sections, 5 equations, 3 figures, 11 tables, 7 algorithms)

This paper contains 43 sections, 5 equations, 3 figures, 11 tables, 7 algorithms.

Introduction
Related Work
Foundations
Methodology
Experimental Design
Tasks and Datasets
Choice of Datasets
Evaluation Model
Embedding Model
Privacy Budget
Metrics
Plausible Deniability (PD) [$N_w \downarrow, S_w \uparrow$]
Perturbation Percentage (PP) $\uparrow$
Cosine Similarity (CS) $\uparrow$
Least-Occuring Words (LOW) $\downarrow$
...and 28 more sections

Figures (3)

Figure 1: Accuracy scores per task and embedding dimension (d). Baseline scores are marked with a dotted line, and the baseline value is indicated in the light blue box. The scale of the y-axis is uniform across sub-figures for comparability.
Figure 2: Ratio of $N_w$ to $S_w$, averaged over three embedding dimensions. Lower ratios correspond to higher plausible deniability. Ratios are shown on the logarithmic scale due to outlier values (i.e., for TEM).
Figure 3: Privacy-Utility Composite (PUC) scores per task, with varying $\alpha$. The PUC scores were averaged across embedding dimension, and these averages are shown for each epsilon ($\epsilon$) value. The left column with $\alpha=0.75$ favors utility, the middle with $\alpha=0.5$ is balanced, and the right with $\alpha=0.25$ favors privacy.

Theorems & Definitions (4)

Definition 3.1: $\varepsilon$-Differential Privacy
Definition 3.2: Metric Differential Privacy, or $d_\mathcal{X}$-privacy
Definition 3.3: Metric Local Differential Privacy
Definition 5.1: Privacy-Utility Composite (PUC $\uparrow$) Score

A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

TL;DR

Abstract

A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (4)