SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Guy Amit; Abigail Goldsteen; Ariel Farkash

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Guy Amit, Abigail Goldsteen, Ariel Farkash

TL;DR

It is found that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

Abstract

Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

TL;DR

Abstract

Paper Structure (31 sections, 3 equations, 6 figures, 6 tables)

This paper contains 31 sections, 3 equations, 6 figures, 6 tables.

Introduction
Background
Membership Inference Attacks against Language Models
Related Surveys
Developing MIA Robust Models
Factors Affecting MIA Vulnerability
Differential Privacy for LLMs:
Empirical Defenses
Experimental Setup
Datasets
Membership Inference Risk Assessment
Evaluation Metrics
Technical Details
Evaluation
Number of Training Iterations
...and 16 more sections

Figures (6)

Figure 1: Effect of number of training iterations on MIA success rate. Left and middle plots - Dotted lines indicate training loss, full lines - test loss.
Figure 2: Effect of batch size on MIA success rate
Figure 3: Effect of pruning on MIA success rate
Figure 4: Effect of $\epsilon$ in DP-SGD on MIA success rate for the Rotten Tomatoes dataset
Figure 5: Complete ROC curves for leading mitigation methods
...and 1 more figures

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

TL;DR

Abstract

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)