Table of Contents
Fetching ...

Privacy-preserving Neural Representations of Text

Maximin Coavoux, Shashi Narayan, Shay B. Cohen

TL;DR

This work defines a privacy framework for NLP representations by analyzing an attacker’s ability to recover private input attributes from hidden representations. It introduces a practical privacy-utility metric and proposes three defense strategies—multidetasking adversarial training, adversarial generation, and declustering—to reduce leakage. Empirical results on sentiment and topic classification show that these defenses generally improve privacy with modest or favorable effects on accuracy, highlighting a viable path toward privacy-preserving NLP in edge-cloud setups. Overall, the paper demonstrates that neural representations can leak sensitive information and offers concrete, transferable methods to mitigate such leakage while maintaining task performance.

Abstract

This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural network is shared across multiple devices, e.g. some hidden representation is computed by a user's device and sent to a cloud-based model. We measure the privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it and characterize the tradeoff between the privacy and the utility of neural representations. Finally, we propose several defense methods based on modified training objectives and show that they improve the privacy of neural representations.

Privacy-preserving Neural Representations of Text

TL;DR

This work defines a privacy framework for NLP representations by analyzing an attacker’s ability to recover private input attributes from hidden representations. It introduces a practical privacy-utility metric and proposes three defense strategies—multidetasking adversarial training, adversarial generation, and declustering—to reduce leakage. Empirical results on sentiment and topic classification show that these defenses generally improve privacy with modest or favorable effects on accuracy, highlighting a viable path toward privacy-preserving NLP in edge-cloud setups. Overall, the paper demonstrates that neural representations can leak sensitive information and offers concrete, transferable methods to mitigate such leakage while maintaining task performance.

Abstract

This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural network is shared across multiple devices, e.g. some hidden representation is computed by a user's device and sent to a cloud-based model. We measure the privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it and characterize the tradeoff between the privacy and the utility of neural representations. Finally, we propose several defense methods based on modified training objectives and show that they improve the privacy of neural representations.

Paper Structure

This paper contains 28 sections, 8 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: General setting illustration. The main classifier predicts a label $y$ from a text $x$, the attacker tries to recover some private information $\mathbf z$ contained in $x$ from the latent representation used by the main classifier.