1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

Stephen Meisenbacher; Maulik Chevli; Florian Matthes

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

Stephen Meisenbacher, Maulik Chevli, Florian Matthes

TL;DR

This work tackles privacy-preserving NLP by introducing 1-Diffractor, a fast word-level MLDP mechanism that operates in a one-dimensional embedding space. By converting embeddings to index-based lists and applying geometric noise, it achieves strong utility with substantially improved efficiency compared to prior methods, while providing formal $\varepsilon d_{\mathcal{V}}$-privacy guarantees. The authors validate utility on GLUE, privacy through plausible deniability and adversarial tests, and efficiency via speed and memory benchmarks, highlighting trade-offs between privacy strength and performance. Overall, 1-Diffractor offers a scalable, lightweight solution for private text obfuscation with practical applicability in real-world NLP pipelines.

Abstract

The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

TL;DR

-privacy guarantees. The authors validate utility on GLUE, privacy through plausible deniability and adversarial tests, and efficiency via speed and memory benchmarks, highlighting trade-offs between privacy strength and performance. Overall, 1-Diffractor offers a scalable, lightweight solution for private text obfuscation with practical applicability in real-world NLP pipelines.

Abstract

mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word

. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing

, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate

for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory.

shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.

Paper Structure (34 sections, 2 theorems, 9 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 2 theorems, 9 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Foundations
Differential Privacy
Local Differential Privacy and NLP
Metric (Local) Differential Privacy
1-Diffractor
Intuition behind converting a word embedding model from $\mathbb{R}^d$ to $\mathbb{Z}$
Word-level $d_\mathcal{X}$-privacy mechanism
Using a single word embedding list
Using multiple word embedding lists
Extending Word-level $d_\mathcal{X}$-privacy to sentences
Utility Experiments
Design
Dataset Preparation
Baseline Model and Scoring
...and 19 more sections

Key Result

Theorem 1

The proposed mechanism $\mathcal{M}$ defined in Equation eq: proposed_mech satisfies $\varepsilon d_\mathcal{V}$-privacy.

Figures (7)

Figure 1: An Overview of 1-Diffractor. Input text is perturbed word-by-word. In this example, we employ the setting in which five word embedding models are used, with one list per model. An input word is diffracted through these lists, producing a list of candidate perturbations, from which a final selection is made randomly.
Figure 2: Average utility drop (loss) across all GLUE tasks of $\textsc{1-D}_G$ and $\textsc{1-D}_T$ with different list configurations and $\varepsilon$ values. Lower scores imply higher preserved utility.
Figure 3: Average utility drop across the SST2, MRPC, and RTE tasks compared to the five selected MLDP mechanisms.
Figure 4: Empirical $N_w$ and $S_w$ statistics for 1-Diffractor and five selected MLDP mechanisms.
Figure 5: Emprical Privacy Results. FI = Friends identification task, TG = Trustpilot gender task.
...and 2 more figures

Theorems & Definitions (3)

Definition 1
Theorem 1
Theorem 2

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

TL;DR

Abstract

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (3)