Table of Contents
Fetching ...

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

Fatemehsadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Rühle, Taylor Berg-Kirkpatrick, Robert Sim

TL;DR

Neural language models memorize training data, creating privacy risks when trained on user-generated content. The authors propose two privacy-regularization strategies—adversarial training with a discriminator and a discriminator-free triplet-based loss—to jointly optimize privacy and utility during training. Through exposure metrics and tab-attacks on Avocado and Reddit datasets, the methods achieve competitive privacy with substantially lower training overhead and less disparate impact than differential privacy. These results suggest practical, scalable pathways for privacy-preserving language modeling without the harsh utility penalties or subgroup biases associated with DP.

Abstract

Neural language models are known to have a high capacity for memorization of training samples. This may have serious privacy implications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice to train models with privacy guarantees, comes with significant costs in terms of utility degradation and disparate impact on subgroups of users. In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term. We compare our methods with DP through extensive evaluation. We show the advantages of our regularizers with favorable utility-privacy trade-off, faster training with the ability to tap into existing optimization approaches, and ensuring uniform treatment of under-represented subgroups.

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

TL;DR

Neural language models memorize training data, creating privacy risks when trained on user-generated content. The authors propose two privacy-regularization strategies—adversarial training with a discriminator and a discriminator-free triplet-based loss—to jointly optimize privacy and utility during training. Through exposure metrics and tab-attacks on Avocado and Reddit datasets, the methods achieve competitive privacy with substantially lower training overhead and less disparate impact than differential privacy. These results suggest practical, scalable pathways for privacy-preserving language modeling without the harsh utility penalties or subgroup biases associated with DP.

Abstract

Neural language models are known to have a high capacity for memorization of training samples. This may have serious privacy implications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice to train models with privacy guarantees, comes with significant costs in terms of utility degradation and disparate impact on subgroups of users. In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term. We compare our methods with DP through extensive evaluation. We show the advantages of our regularizers with favorable utility-privacy trade-off, faster training with the ability to tap into existing optimization approaches, and ensuring uniform treatment of under-represented subgroups.

Paper Structure

This paper contains 14 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Workflow of our adversarial training regularization. The last hidden state ($h_x$) of the LM is fed to the discriminator to generate a distribution over the authors ($p_d$). $p_d$ is used to compute $\mathcal{L}_{\textsc{LM-P}}$, the privacy loss.
  • Figure 2: Exposure metric results for different training schemes at similar perplexities. Unmitigated denotes conventional training. Adversarial and Triplet are our regularizers. Higher exposure indicates lower privacy.
  • Figure 3: (a, b) Tab attack results for reconstructing canary sequences for two utility levels. Higher attack accuracy indicates lower privacy. (c) Effect of different mitigations on utility of well represented (Top-5) and under-represented (Low-5) users for Avocado dataset.
  • Figure 4: Per epoch training time break down, normalized to conventional execution. Differential privacy is $16.44\times$ slower than conventional execution. Triplet and Adversarial are our proposed regularizations.