Table of Contents
Fetching ...

How do we measure privacy in text? A survey of text anonymization metrics

Yaxuan Ren, Krithika Ramesh, Yaxing Yao, Anjalie Field

TL;DR

This survey dissects how privacy is measured in text anonymization, identifying six core privacy objectives and cataloging the metrics used across 47 recent papers. It reveals fragmentation between technical metrics and legal/social privacy notions, and argues for more adversarial, context-aware, and human-centered evaluations. The authors map metrics to HIPAA and GDPR concerns, discuss gaps in current practices, and provide practical recommendations to harmonize evaluation with legal standards and real-world user expectations. The work aims to standardize and broaden privacy evaluations in text, bridging gaps between text-level protections and broader model privacy implications.

Abstract

In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP research and model development in domains with sensitive data, evaluating whether anonymization methods sufficiently protect privacy remains an open challenge. In manually reviewing 47 papers that report privacy metrics, we identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk. We then assess how well these notions align with legal privacy standards (HIPAA and GDPR), as well as user-centered expectations grounded in HCI studies. Our analysis offers practical guidance on navigating the landscape of privacy evaluation approaches further and highlights gaps in current practices. Ultimately, we aim to facilitate more robust, comparable, and legally aware privacy evaluations in text anonymization.

How do we measure privacy in text? A survey of text anonymization metrics

TL;DR

This survey dissects how privacy is measured in text anonymization, identifying six core privacy objectives and cataloging the metrics used across 47 recent papers. It reveals fragmentation between technical metrics and legal/social privacy notions, and argues for more adversarial, context-aware, and human-centered evaluations. The authors map metrics to HIPAA and GDPR concerns, discuss gaps in current practices, and provide practical recommendations to harmonize evaluation with legal standards and real-world user expectations. The work aims to standardize and broaden privacy evaluations in text, bridging gaps between text-level protections and broader model privacy implications.

Abstract

In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP research and model development in domains with sensitive data, evaluating whether anonymization methods sufficiently protect privacy remains an open challenge. In manually reviewing 47 papers that report privacy metrics, we identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk. We then assess how well these notions align with legal privacy standards (HIPAA and GDPR), as well as user-centered expectations grounded in HCI studies. Our analysis offers practical guidance on navigating the landscape of privacy evaluation approaches further and highlights gaps in current practices. Ultimately, we aim to facilitate more robust, comparable, and legally aware privacy evaluations in text anonymization.

Paper Structure

This paper contains 22 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of six privacy objectives used in text privacy evaluation. Each panel summarizes the privacy notion and provides an illustrative example.