Table of Contents
Fetching ...

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

Ahmed Frikha, Nassim Walha, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou

TL;DR

This paper addresses privacy leakage from author attributes in free-form text generated by LLMs. It introduces IncogniText, a two-stage, adversarially guided text rewriting framework conditioned on a target wrong attribute value to mislead attackers while preserving meaning. Through extensive experiments on synthetic and Reddit datasets, IncogniText achieves up to ~90% reduction in attribute-inference accuracy and demonstrates on-device deployment via LoRA, with modest utility impact. The work offers a practical privacy-preserving solution for real-world text sharing and points toward future directions in data minimization and differential privacy guarantees.

Abstract

In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. Our results show the possibility of reducing privacy leakage by more than half with limited impact on utility.

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

TL;DR

This paper addresses privacy leakage from author attributes in free-form text generated by LLMs. It introduces IncogniText, a two-stage, adversarially guided text rewriting framework conditioned on a target wrong attribute value to mislead attackers while preserving meaning. Through extensive experiments on synthetic and Reddit datasets, IncogniText achieves up to ~90% reduction in attribute-inference accuracy and demonstrates on-device deployment via LoRA, with modest utility impact. The work offers a practical privacy-preserving solution for real-world text sharing and points toward future directions in data minimization and differential privacy guarantees.

Abstract

In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. Our results show the possibility of reducing privacy leakage by more than half with limited impact on utility.
Paper Structure (7 sections, 3 figures, 6 tables)

This paper contains 7 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of the IncogniText anonymization approach. In this example, the true user attribute value $a_{true}$ (middle income) is obfuscated via the IncogniText anonymization conditioned on a wrong target value $a_{target}$ (low income) with minimal text changes.
  • Figure 2: Private attribute inference accuracy (%) by attribute for unprotected text and different anonymization methods. The first four bars for each attribute represent accuracy values reported in staab2024large and are evaluated using GPT-4 as the privacy evaluation model. The remaining five bars are evaluations from our experiments, performed using Phi-3-small as the privacy evaluation model.
  • Figure 3: Number of anonymization steps required before the adversary predicts the attribute value incorrectly. Average number of steps is 1.3 for IncogniText and 1.9 for FgAA.