Table of Contents
Fetching ...

Investigating User Perspectives on Differentially Private Text Privatization

Stephen Meisenbacher, Alexandra Klymenko, Alexander Karpp, Florian Matthes

TL;DR

This paper investigates how lay users perceive differentially private text privatization by presenting tangible, locally privatized outputs through vignette-based scenarios. Using a factorial survey with 721 participants and four DP mechanisms, it shows that output coherence and linguistic quality strongly influence privacy-budget choices, sometimes outweighing stated privacy attitudes. The results reveal that mechanism type is the most salient factor, with generative approaches prompting lower privacy budgets when outputs are coherent, and non-generative methods prompting a preference for utility. The work calls for a human-centered roadmap in DP NLP, stressing usability, contextual understanding, and continued user studies to ensure practical adoption of privacy-preserving text technologies.

Abstract

Recent literature has seen a considerable uptick in $\textit{Differentially Private Natural Language Processing}$ (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information $\textit{and}$ maintain original semantics. Despite continued work to address the open challenges in DP text privatization, there remains a scarcity of work addressing user perceptions of this technology, a crucial aspect which serves as the final barrier to practical adoption. In this work, we conduct a survey study with 721 laypersons around the globe, investigating how the factors of $\textit{scenario}$, $\textit{data sensitivity}$, $\textit{mechanism type}$, and $\textit{reason for data collection}$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts. Our findings highlight the socio-technical factors that must be considered in the study of DP NLP, opening the door to further user-based investigations going forward.

Investigating User Perspectives on Differentially Private Text Privatization

TL;DR

This paper investigates how lay users perceive differentially private text privatization by presenting tangible, locally privatized outputs through vignette-based scenarios. Using a factorial survey with 721 participants and four DP mechanisms, it shows that output coherence and linguistic quality strongly influence privacy-budget choices, sometimes outweighing stated privacy attitudes. The results reveal that mechanism type is the most salient factor, with generative approaches prompting lower privacy budgets when outputs are coherent, and non-generative methods prompting a preference for utility. The work calls for a human-centered roadmap in DP NLP, stressing usability, contextual understanding, and continued user studies to ensure practical adoption of privacy-preserving text technologies.

Abstract

Recent literature has seen a considerable uptick in (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information maintain original semantics. Despite continued work to address the open challenges in DP text privatization, there remains a scarcity of work addressing user perceptions of this technology, a crucial aspect which serves as the final barrier to practical adoption. In this work, we conduct a survey study with 721 laypersons around the globe, investigating how the factors of , , , and impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts. Our findings highlight the socio-technical factors that must be considered in the study of DP NLP, opening the door to further user-based investigations going forward.

Paper Structure

This paper contains 33 sections, 5 figures, 10 tables.

Figures (5)

  • Figure 1: An example of a vignette on our survey platform. The annotations in the figure indicate the different factors of our FSM model, where underlined treatments are those depicted in the example. Participants were presented first with the original ($\varepsilon = \infty$) text, and then could proceed to use the slider to consider privatized counterparts.
  • Figure 2: Our research model for the FSM study.
  • Figure 3: Raw frequency of privacy level responses (1-5) per each tested factor.
  • Figure 4: Q-Q Plot of the observed slider values.
  • Figure 5: An architecture diagram of our custom-built survey web application.