On the Vulnerability of Text Sanitization

Meng Tong; Kejiang Chen; Xiaojian Yuan; Jiayang Liu; Weiming Zhang; Nenghai Yu; Jie Zhang

On the Vulnerability of Text Sanitization

Meng Tong, Kejiang Chen, Xiaojian Yuan, Jiayang Liu, Weiming Zhang, Nenghai Yu, Jie Zhang

TL;DR

The paper tackles privacy leakage in DP-based text sanitization and the inadequacy of existing empirical reconstruction attacks for evaluating protection. It introduces theoretically optimal reconstruction attacks—context-free and contextual—along with ASR bounds to benchmark sanitization, and two practical Bayesian attacks that approximate these bounds using shadow data. Experiments across SST-2, AGNEWS, QNLI, and Yelp show these attacks outperform baselines, with the Contextual Bayesian Attack achieving up to a 46.4% ASR improvement at ϵ=4.0 on SST-2. These results provide a rigorous framework for evaluating privacy in text sanitization and highlight that DP-based sanitization can be more vulnerable than previously thought, while noting limitations such as the incomplete tight contextual bound and runtime considerations.

Abstract

Text sanitization, which employs differential privacy to replace sensitive tokens with new ones, represents a significant technique for privacy protection. Typically, its performance in preserving privacy is evaluated by measuring the attack success rate (ASR) of reconstruction attacks, where attackers attempt to recover the original tokens from the sanitized ones. However, current reconstruction attacks on text sanitization are developed empirically, making it challenging to accurately assess the effectiveness of sanitization. In this paper, we aim to provide a more accurate evaluation of sanitization effectiveness. Inspired by the works of Palamidessi et al., we implement theoretically optimal reconstruction attacks targeting text sanitization. We derive their bounds on ASR as benchmarks for evaluating sanitization performance. For real-world applications, we propose two practical reconstruction attacks based on these theoretical findings. Our experimental results underscore the necessity of reassessing these overlooked risks. Notably, one of our attacks achieves a 46.4% improvement in ASR over the state-of-the-art baseline, with a privacy budget of epsilon=4.0 on the SST-2 dataset. Our code is available at: https://github.com/mengtong0110/On-the-Vulnerability-of-Text-Sanitization.

On the Vulnerability of Text Sanitization

TL;DR

Abstract

On the Vulnerability of Text Sanitization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)

Theorems & Definitions (4)