Table of Contents
Fetching ...

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

Stephen Meisenbacher, Maulik Chevli, Florian Matthes

TL;DR

This work tackles privacy-preserving NLP by introducing 1-Diffractor, a fast word-level MLDP mechanism that operates in a one-dimensional embedding space. By converting embeddings to index-based lists and applying geometric noise, it achieves strong utility with substantially improved efficiency compared to prior methods, while providing formal $\varepsilon d_{\mathcal{V}}$-privacy guarantees. The authors validate utility on GLUE, privacy through plausible deniability and adversarial tests, and efficiency via speed and memory benchmarks, highlighting trade-offs between privacy strength and performance. Overall, 1-Diffractor offers a scalable, lightweight solution for private text obfuscation with practical applicability in real-world NLP pipelines.

Abstract

The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

TL;DR

This work tackles privacy-preserving NLP by introducing 1-Diffractor, a fast word-level MLDP mechanism that operates in a one-dimensional embedding space. By converting embeddings to index-based lists and applying geometric noise, it achieves strong utility with substantially improved efficiency compared to prior methods, while providing formal -privacy guarantees. The authors validate utility on GLUE, privacy through plausible deniability and adversarial tests, and efficiency via speed and memory benchmarks, highlighting trade-offs between privacy strength and performance. Overall, 1-Diffractor offers a scalable, lightweight solution for private text obfuscation with practical applicability in real-world NLP pipelines.

Abstract

The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word . Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing , a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.
Paper Structure (34 sections, 2 theorems, 9 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 2 theorems, 9 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

The proposed mechanism $\mathcal{M}$ defined in Equation eq: proposed_mech satisfies $\varepsilon d_\mathcal{V}$-privacy.

Figures (7)

  • Figure 1: An Overview of 1-Diffractor. Input text is perturbed word-by-word. In this example, we employ the setting in which five word embedding models are used, with one list per model. An input word is diffracted through these lists, producing a list of candidate perturbations, from which a final selection is made randomly.
  • Figure 2: Average utility drop (loss) across all GLUE tasks of $\textsc{1-D}_G$ and $\textsc{1-D}_T$ with different list configurations and $\varepsilon$ values. Lower scores imply higher preserved utility.
  • Figure 3: Average utility drop across the SST2, MRPC, and RTE tasks compared to the five selected MLDP mechanisms.
  • Figure 4: Empirical $N_w$ and $S_w$ statistics for 1-Diffractor and five selected MLDP mechanisms.
  • Figure 5: Emprical Privacy Results. FI = Friends identification task, TG = Trustpilot gender task.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • Theorem 2