Table of Contents
Fetching ...

Large Language Models are Advanced Anonymizers

Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev

TL;DR

The paper addresses the growing privacy risks posed by large language models inferring sensitive attributes from online text. It introduces an adversarial anonymization framework where a surrogate adversary LLM performs attribute inferences and an anonymizer LLM iteratively rewrites text to thwart these inferences while preserving readability and meaning. Through extensive experiments on 13 LLMs across real-world PersonalReddit and synthetic SynthPAI data, the authors demonstrate that feedback-guided adversarial anonymization achieves a superior privacy-utility tradeoff compared to industry baselines like Azure Language Services, with a human study validating user-perceived improvements. The work also discusses scalability with model size, local deployment feasibility, and ethical considerations, highlighting practical avenues for deploying robust, readable anonymized text in real-world online settings.

Abstract

Recent privacy research on large language models (LLMs) has shown that they achieve near-human-level performance at inferring personal data from online texts. With ever-increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. In this work, we take two steps to bridge this gap: First, we present a new setting for evaluating anonymization in the face of adversarial LLM inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. Then, within this setting, we develop a novel LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. We conduct a comprehensive experimental evaluation of adversarial anonymization across 13 LLMs on real-world and synthetic online texts, comparing it against multiple baselines and industry-grade anonymizers. Our evaluation shows that adversarial anonymization outperforms current commercial anonymizers both in terms of the resulting utility and privacy. We support our findings with a human study (n=50) highlighting a strong and consistent human preference for LLM-anonymized texts.

Large Language Models are Advanced Anonymizers

TL;DR

The paper addresses the growing privacy risks posed by large language models inferring sensitive attributes from online text. It introduces an adversarial anonymization framework where a surrogate adversary LLM performs attribute inferences and an anonymizer LLM iteratively rewrites text to thwart these inferences while preserving readability and meaning. Through extensive experiments on 13 LLMs across real-world PersonalReddit and synthetic SynthPAI data, the authors demonstrate that feedback-guided adversarial anonymization achieves a superior privacy-utility tradeoff compared to industry baselines like Azure Language Services, with a human study validating user-perceived improvements. The work also discusses scalability with model size, local deployment feasibility, and ethical considerations, highlighting practical avenues for deploying robust, readable anonymized text in real-world online settings.

Abstract

Recent privacy research on large language models (LLMs) has shown that they achieve near-human-level performance at inferring personal data from online texts. With ever-increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. In this work, we take two steps to bridge this gap: First, we present a new setting for evaluating anonymization in the face of adversarial LLM inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. Then, within this setting, we develop a novel LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. We conduct a comprehensive experimental evaluation of adversarial anonymization across 13 LLMs on real-world and synthetic online texts, comparing it against multiple baselines and industry-grade anonymizers. Our evaluation shows that adversarial anonymization outperforms current commercial anonymizers both in terms of the resulting utility and privacy. We support our findings with a human study (n=50) highlighting a strong and consistent human preference for LLM-anonymized texts.
Paper Structure (76 sections, 1 equation, 20 figures, 5 tables)

This paper contains 76 sections, 1 equation, 20 figures, 5 tables.

Figures (20)

  • Figure 1: Feedback-guided adversarial anonymization and the adversarial inference setting. Given user-written texts, we depict 1. At the bottom: classical NLP-based anonymization with industry tools such as Presidio and Azure Language Studio. Making use of entity recognition, these tools produce a set of spans that afterward get masked completely, resulting in an anonymized text depicted at the bottom right. 2. At the top, we show our feedback-guided adversarial anonymization procedure. In each round, an adversarial LLM tries to predict personal attributes from the current instance of the text. Based on this inference, an anonymizer LLM then removes and adapts relevant sections of the text in order to prevent such inferences. After multiple rounds, the anonymizer outputs the resulting text. On the right we depict our anonymization under adversarial inference setting. Unlike previous metrics, we evaluate the anonymized texts directly against a strong adversarial LLM, which tries to infer personal attributes. We can observe that text produced by adversarial anonymization has both higher utility and privacy than the one obtained through traditional methods.
  • Figure 2: Intermediate steps of our adversarial anonymization framework on a (perturbed) real-world example. Adversarial inferences (GPT-4) are shortened for brevity. We observe how the first round of anonymization detects direct references to the author's sex, leading to their removal in $t_1$. In round $2$, the adversary relied on more subtle usages of language for its inference. On $t_2$, the adversary was unable to infer any sex with certainty.
  • Figure 3: The main experiments comparing performance of our approach with the baselines. \ref{['fig:main_a']} shows how adversarial methods (having suffix -AA and shown in 5 iterations) improve utility and adversarial accuracy on the PersonalReddit dataset compared to classical methods. The baseline on non-anonymized text is shown in the top-right corner as $\filledstar$. For each OSS model family, we only show the strongest model. \ref{['fig:main_b']} shows the number of correct predictions for four exemplary attributes on PersonalReddit. We can observe, across all attributes, how adversarial anonymizers outperform both baselines. While Llama3.1-70B and GPT-4 continuously improve with each round, the more limited Yi-34B struggles to improve on two of the four attributes.
  • Figure 4: Comparison of GPT-4 with feedback-guided adversarial anonymization (AA) vs without (denoted by Base). Even after 5 rounds Base does not reach the anonymization performance of AA after 2 rounds. After all 5 iterations, this yields an adversarial accuracy delta of $\sim 10\%$.
  • Figure 5: Adversarial accuracy and certainty on correctly classified examples by the final GPT-4 adversary. GPT-4-AA not only leads to fewer correct inferences but also reduces the certainty of the adversary in its correct predictions, forcing it to rely on inherent biases.
  • ...and 15 more figures