PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

Ruixuan Liu; Tianhao Wang; Yang Cao; Li Xiong

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong

TL;DR

PreCurious framework is proposed to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model, and demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model.

Abstract

The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. $ε=0.05$. Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

TL;DR

Abstract

. Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.

Paper Structure (39 sections, 5 equations, 13 figures, 9 tables)

This paper contains 39 sections, 5 equations, 13 figures, 9 tables.

Introduction
Threat Model and Preliminaries
Parameter-Efficient Fine-tuning (PEFT)
Threat Model
Adversarial Capabilities
Privacy Game
Adversarial Goal
Success Metrics
Membership Inference Attack
Data Extraction Attack
Stealthiness
Amplifying Privacy Risk with PreCurious
Attack Overview
PreCurious Framework
Key Intuition
...and 24 more sections

Figures (13)

Figure 1: The privacy vulnerability for target models fine-tuned by various methods ranks as Head-FT $>$ Full $>$ Adapter-FT. PreCurious increases the privacy risk for each iteration and ruins the privacy-utility trade-off, as demonstrated with Head-FT and Adapter-FT. $E_\text{ft}$ indicates the fine-tuning epochs and lower validation perplexity means better performance.
Figure 2: Framework overview of PreCurious. The dashed gray line indicates extra side information that can be utilized: 1) the stopping criterion, 2) the fine-tuning method, and 3) the released sanitized data by masking the secret. We design Accelerated and Lagging strategies for stopping by epoch or by performance. We propose an aggressive anti-freezing strategy when the victim uses the given fine-tuning method. We utilize a released sanitized dataset in targeted data extraction experiments.
Figure 3: Privacy risk for different model initialization status. Each point indicates the fine-tuned checkpoint for the Enron dataset with Adapter-FT. We use TPR@0.1FPR as the proxy metric to measure the privacy risk of the model based on the scoring method in \ref{['eq:mia_loss']}. We fully-finetuned the benign GPT-2 model on the auxiliary dataset for $E_\text{pre}=1$ and $E_\text{pre}=5$ separately for Lagging Init and Accelerated Init with learning rate $\eta_\text{pre}=10^{-5}$ as model initialization.
Figure 4: Ablation study of PreCurious on the crafted initialization and reference model with Enron and Adapter-FT GPT-2. Loss distributions for Benign initialization w/o $\theta_\text{ref}$, benign initialization w/ Full-Ref, and PreCurious initialization w/ Full-Ref.
Figure 5: ROC-AUC curve for Enron on Adapter-FT GPT-2. Base-Full indicates calibrating with a benign model cannot even beat Loss-Att with the same benign initialization.
...and 8 more figures

Theorems & Definitions (2)

Definition 2.1: Successful privacy risk amplification
Definition 2.2: Privacy risk amplification stealthiness

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

TL;DR

Abstract

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (2)